使用Benchmark对Perl脚本进行性能测试-FreeOA

使用Benchmark对Perl脚本进行性能测试

2013-04-28 14:08:01

阿炯

当写perl一段时间后，会发现一些功能可以有多种写法，可以使用内置函数方法、核心类模块、cpan上的模块或自定义模块，但使用哪种好呢。就代码的可读与维护不说，这里单说代码段的性能，如何来测试不同代码写法的性能呢？当然perl功能性能的测试有多种方法，这里介绍使用Benchmark模块。

Benchmark - benchmark running times of Perl code

核心函数用法参考：
The Benchmark module encapsulates a number of routines to help you figure out how long it takes to execute some code.
timethis - run a chunk of code several times(在具体时间内执行代码块得到所用的cpu时间)

timethese - run several chunks of code several times(在具体时间内执行不同代码块以取得对比结果)

cmpthese - print results of timethese as a comparison chart(将timethese的结果以对比图的方式打印出来)

timeit - run a chunk of code and see how long it goes(执行一个代码块以得出其消耗时间)

countit - see how many times a chunk of code runs in a given time(在给定时间里执行代码块得到其计算次数)

基本性能，块的性能对比，这个可以使用 sub 和 eval 之类来做性能的测试。

timethis
timethis ($count, "the code");

可以看下面的例子，第一个参数是运行的次数，如果是负数，就变成了使用的时间；所以我们会更加常用时间来测试(默认为 3 秒)，第二个参数是代码块(子函数、eval)，比如下面的例子我使用的是测试 foreach 运行 3 秒所占用的 CPU。

use Benchmark;
timethis (-3, sub { foreach(1..100){} });

timethis 功能的输出时，主要关注后面的二个部分，可以见到后面的数字 (n=781583) 是指在这 3秒，对后面这个代码块(子函数、eval)运行的次数。

timethis for 3: 3 wallclock secs ( 3.07 usr + 0.00 sys = 3.07 CPU) @ 254587.30/s (n=781583)

timethese
timethese($count, {
'Name1' => sub { ...code1... },
'Name2' => sub { ...code2... },
});

'timethis'只对放一个代码块，这个地方可以放二个以上的代码块来对比。这样可以方便我们对比多种写法实现同样的功能时，性能最好是其中什么写法。

参数也是一样，第一个参数是时间，但这个时间是指每个代码块所使用的时间。下面然后我接了子函数的块，每个块给了一个名字，方便输出时分析。

use List::Util qw(first);
use List::MoreUtils qw(any);
use Benchmark;

my @list = ( 1..10_000 );
my $hit = 5_000;
my $hit_regex = qr/^$hit$/; # precompute regex
my %params;
$params{$_} = 1 for @list; # precompute hash
timethese(
100_000, {
'first' => sub {
die unless ( first { $hit_regex } @list );
},
'grep' => sub {
die unless ( grep { $hit_regex } @list );
},
'hash' => sub {
die unless ( $params{$hit} );
},
});

对比结果：
Benchmark: timing 100000 iterations of first, grep, hash...
first: 1 wallclock secs ( 0.63 usr + 0.01 sys = 0.64 CPU) @ 156250.00/s (n=100000)
grep: 42 wallclock secs (41.95 usr + 0.08 sys = 42.03 CPU) @ 2379.25/s (n=100000)
hash: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) @ 10000000.00/s (n=100000)
(warning: too few iterations for a reliable count)

cmpthese
cmpthese($count, {
'Name1' => sub { ...code1... },
'Name2' => sub { ...code2... },
});

接上例，使用方式如下：
my $rc=timethese(COUNT, CODEHASHREF, [ STYLE ] )

cmpthese($rc);

会有如下输出：
Benchmark: timing 100000 iterations of first, grep, hash...
     first: 0 wallclock secs ( 0.65 usr + 0.00 sys = 0.65 CPU) @ 153846.15/s (n=100000)
      grep: 56 wallclock secs (57.20 usr + 0.00 sys = 57.20 CPU) @ 1748.25/s (n=100000)
      hash: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU) @ 10000000.00/s (n=100000)
            (warning: too few iterations for a reliable count)
            Rate    grep   first    hash
grep      1748/s      --    -99%   -100%
first   153846/s   8700%      --    -98%
hash 10000000/s 571900%   6400%      --

与上面的'timethese'相比，多了三种处理方法的结果对比。

timeit
执行一个代码块，得到其所耗时间。当然也可以执行多次。

$t = timeit($count, '...the code...');

所得到的结果需要借助于'timestr'函数转换为可读时间，示例如下：
将'Unix timestamp'转换为正常的日期时间，比较'DateTime'模块与内置的'localtime'函数。

use DateTime;
use POSIX qw(strftime);
use Benchmark qw(:all);
$\="\n";
my $epoch=1362562931.352329;

sub hdateime{
my $dt = DateTime->from_epoch(epoch=>$epoch,time_zone => "Asia/Shanghai");
return $dt->ymd . " " . $dt->hms . "\n";
}

sub poslocal{
return strftime('%Y-%m-%d %H:%M:%S', localtime($epoch))."\n";
}

my ($th,$tp)=(timeit(10000,hdateime=>hdateime()),timeit(10000,poslocal=>poslocal()));
print timestr($th);
print timestr($tp);

结果：
3 wallclock secs ( 2.78 usr + 0.00 sys = 2.78 CPU) @ 3597.12/s (n=10000)
0 wallclock secs ( 0.00 usr + 0.25 sys = 0.25 CPU) @ 40000.00/s (n=10000)

显然，使用内置核心库、函数的效率要比用第三方的模块要高，当然前者的功能要比后要少很多，功能与效率不可兼得啊。

countit
在给定的时间内，执行一段代码，得到其运行的次数。

$t = countit($time, '...the code...')
$count = $t->iters ;
print "$count loops of the code took:",timestr($t),"\n";

以下代码用于返回'timestamp'对应的日期。
use Benchmark qw(:all);
use Data::Dumper;
$\="\n";
my $epoch=1362562931.352329;

sub getlocal{
return (localtime($epoch))[3];
}

#print getlocal;
my $rc=countit(2,getlocal=>getlocal());
$count = $rc->iters ;
print "$count loops of the code took:",timestr($rc),"\n";

运行结果为：
250973 loops of the code took: 3 wallclock secs ( 0.05 usr + 2.08 sys = 2.13 CPU) @ 117827.70/s (n=250973)

Benchmark的方法(Methods)
new
初始化一个'benchmark'实例，可用于测试脚本阶段性性能。
$t0 = Benchmark->new;
# ... your code here ...
$t1 = Benchmark->new;
$td = timediff($t1, $t0);
print "the code took:",timestr($td),"\n";

debug
Enables or disable debugging by setting the $Benchmark::Debug flag.
可用于开启是否调试功能。
Benchmark->debug(1);
...the coding...
Benchmark->debug(0);

iters
Returns the number of iterations.
返回本次测试运行的次数。

timediff
timediff ( T1, T2 )
Returns the difference between two Benchmark times as a Benchmark object suitable for passing to timestr().
返回两个对比测试的结果值的时间差异，以供'timestr'来解析。

timestr
timestr ( TIMEDIFF, [ STYLE, [ FORMAT ] ] )
Returns a string that formats the times in the TIMEDIFF object in the requested STYLE. TIMEDIFF is expected to be a Benchmark object similar to that returned by timediff().

以便解析测试结果值，返回标准的测试结果。

当然，还有更多选项用于控制Benchmark的行为，在默认的情况下这些已经能满足要求。

参考文档：
Benchmark

Wall clock time