perl数组高级应用-FreeOA

perl数组高级应用

2013-03-11 11:37:45

阿炯

Hash结构中的key是唯一的。在将键值对存储到hash结构时，会对key进行hash计算，然后根据计算得到的hash值决定该键值对的值存储在何处，由于相同的key总是计算得到相同的hash值，因此先后两次存储key相同的键值对时，后存储的值将覆盖已存储的值。

但不同的key也可能会计算出相同的hash值，这时将造成hash冲突(hash碰撞)问题：不同的键值对将存储在同一个位置。虽然hash冲突的计算较低，但仍然需要提供hash冲突时的解决方案且方案有多种，不同语言采用不同的策略。hash结构不保证键值对的顺序，比如遍历时的顺序是不可预测的，并且插入新的键值对可能还会改变顺序，因此不要依赖hash的键值对顺序。另外，有些语言实现了按照键值对存储时的先后顺序进行遍历。

hash结构的内存空间利用率普遍不高。hash结构会划分hash桶(hash bucket)，每个桶都预分配一些空间槽(slot)，槽的数量决定了一个桶中最多能存放多少数据。每次存储键值对时，都将根据对key计算出来的hash值决定value存放在哪个桶以及桶的哪个位置(slot)。因此其搜索速度和增删键值对的速度很快，且不会随着所存储键值对元素数量的增长而变慢，它由hash桶的大小决定。而数组的平均搜索速度则会随着元素的增长而逐渐变慢。但当某次向hash中存储键值对时因空间不够而触发了扩容，速度会很慢，因为扩容时需要迁移整个hash结构，包括对所有的key进行rehash、拷贝内存数据。因此一次性存储大量键值对时，会明显感受到长久的耗时。

需要将数组print到输出，并带有简单的格式(将数组内容加工后以字符串的方式返回)。
有两个内置变量可以借用：
$" 列表分隔符
$, @array 元素间的连接符

-----perl print array with formatting-将数组使用格式化打印出来
use join():
# assuming @array is your array:
print join(", ", @array);
-----
use 5.012_002;
my @array = qw/ 1 2 3 4 5 /;
{
local $" = ', ';
print "@array\n";
}

OR with $,:
use 5.012_002;
my @array = qw/ 1 2 3 4 5 /;
{
local $, = ', ';
say @array;
}

$string1 = join( '-', @string );

$scal = join(",", @arr);
# $scal is no "1,2,3"

From perldoc perlvar:
$LIST_SEPARATOR
$"

print "The array is: @array\n";
相当于：
print "The array is: " . join($", @array) . "\n";
-----
关于数组分隔符号可以在文档'perlvar'中详细查看。
Search for "separator" in perldoc perlvar: what you really need is $".

my @x = (1, 2, 3);
local $"="}, {"; # " oops I confused highlighter
print "{@x}\n";

Will output:
{1}, {2}, {3}

Of course, solution for the initial question will be simply
local $" = ', '; #"# highlight
print "@x\n";

默认的$"值为空格，可以显式的修改其值：
{
local $" = ':',
my @arr = (1, 2, 3);
my $scalar = "@arr"; # $scalar contains '1:2:3'
}
-----
Split string into hash--折分格式字符串为hash数组
I tried the string convert to hash in the following format.
$string="1:one;2:two;3:three";
%hash=(1=>"one", 2=>"two", 4=>"three");

%hash = map{split /\:/, $_}(split /;/, $string);
my %hash = split /[;:]/, $string;

print Dumper \%hash;

-----

perl数组的一些高级应用

1)、去除一个数组中的重复元素
使用grep函数代码片段：
my @array = ( 'a', 'b', 'c', 'a', 'd', 1, 2, 5, 1, 5 );
my %count;
my @uniq_times = grep { ++$count{ $_ } < 2; } @array;

使用转换hash代码片段：

my @array = ( 'a', 'b', 'c', 'a', 'd', 1, 2, 5, 1, 5 );
my %saw;
@saw{@array} = ( );
my @uniq_array = sort keys %saw;

2)、合并两个array
push @array1, @array2;

3)、快速查找最大值
my @nums = 0 .. 1000;
my $max = $nums[0];
foreach (@nums) {
$max = $_ if $_ > $max;
}

或这样：
use List::Util qw(max);
my $max_num = max( 0 .. 1000 );

或这样：
use List::Util qw(maxstr);
my $max_str = maxstr ( qw( Fido Spot Rover ) );

字符串比较容易，还有sum：
use List::Util qw(sum);
my $sum = sum (1 .. 1000);

4)、列表归并
数字求和，也可以用List::Util中的reduce：
use List::Util qw(reduce);
my $sum = reduce { $a + $b } 1 .. 1000;

与sort类似，reduce也是用code block作为参数，不过运行机制稍微不同。每次迭代，先从参数列表取出前面两个元素，分别设置为别名$a和$b，这样参数列表的长度就会缩短为两个元素。然后reduce把语句块返回的计算结果再压回到参数列表的头部。如此往复，直到最后列表里只剩下一个元素，也就是迭代的计算结果$sum。

好了，可以这样了：
my $product = reduce { $a * $b } 1 .. 1000;

5)、判断是否有元素匹配
纯粹用Perl实现，找到列表中第一个符合某条件的元素，比找出所有符合条件的要麻烦一些。下面的例子，判断是否有大于1000的元素：
my $found_a_match = grep { $_ > 1000 } @list;

注意：如果@list有一亿个元素，而要找的就是1001？grep仍然还会循环一亿次，当然你可以向下面自己控制下：
my $found_a_match = 0;
foreach my $elem (@list) {
$found_a_match = $elem if $elem > 1000;
last if $found_a_match;
}

还是那句话，不简单~~~List::Util有现成的东西：
use List::Util qw(first);
my $found_a_match = fist { $_ > 1000 } @list;

在List::MoreUtils模块中，也提供很多的实用函数：
my $found_a_match = any { $_ > 1000 } @list;
my $all_greater = all { $_ > 1000 } @list;
my $none_greater = none { $_ > 1000 } @list;
my $all_greater = notall { $_ % 2 } @list;

6)、一次遍历多个列表
一般我们同时遍历多个业务相关的列表时，往往用数组下标遍历：
my @a = ( ... );
my @b = ( ... );
my @c;

foreach my $i ( 0 .. $#list ) {
my ( $a, $b ) = ( $a[$i], $b[$i] );
push @c, $a + $b;
}

看下面这个，你的感觉是？

use List::MoreUtils qw(pairwise);
my @c = pairwise { $a + $b } @a, @b;

pairwise只适合两个列表的同步计算，三个后用each_array：
use List::MoreUtils qw(each_array);
my $ea = each_array( @a, @b, @c );

my @d;
while ( my ( $a, $b, $c ) = $ea->() ) {
push @d, $a+$b+$c;
}

虽然还是有点烦，不过也还好了。

7)、数组合并

合并多个数组的操作当然你可以自己写，但终究不如MoreUtils的mesh方便：
use List::MoreUtils qw(mesh);
my @odds = qw/ 1 3 5 7 9/;
my @evens= qw/ 2 4 6 8 0/;

my @nums = mesh @odds, @evens; # print：1 2 3 4 ...

要点

用List::Util和List::MoreUtils简化列表处理；

用List::MoreUtils的all、any、none、notall筛选列表；

用pairwise或each_array完成多个列表的同步处理；

参考：用List::Util和List::MoreUtils简化列表处理

8)、从列表随机获取元素（不修改原列表）

从数组/list 中随机取一个/多个元素，而原数组保持完全不变。

(1)、使用 List::Util 里的 shuffle + 索引

use List::Util qw(shuffle); # 核心模块

# 原始列表（永远不会被修改）
my @original = qw(apple banana cherry date eggplant);

# ----------------------
# 方法1：随机取 1 个元素
# ----------------------
my $random_one = (shuffle @original)[0];
say "随机1个: $random_one";

# ----------------------
# 方法2：随机取 N 个元素
# ----------------------
my $n = 2;
my @random_n = (shuffle @original)[0 .. $n-1];
say "随机$n个: " . join(', ', @random_n);

# 原列表完全没变
say "\n原列表: " . join(', ', @original);

一行式完成：
my $item = (shuffle @list)[0];

my @items = (shuffle @list)[0,1,2];

my @items = (shuffle @list)[0..2];

(2)、内置函数（不依赖任何模块）

# 随机取1个
my $random = $original[ rand @original ];

随机取 1 个：my $r = $arr[ rand @arr ];（最快）
随机取 N 个：my @r = (shuffle @arr)[0..$n-1];（最通用，需要引入List::Util模块）

-----
Perl从数组中移除元素或数组

从数组中一个或多个元素(又一个数组)

$”(列表分隔符)，这是一个用于处理列表的重要的系统变量：$LIST_SEPARATOR

print "The array is: @array\n";
相当于
print "The array is: " . join($", @array) . "\n";

$,(@array元素间的连接符)
$OUTPUT_FIELD_SEPARATOR
$OFS

use v5.32;
#22-02-27:从数组中一个或多个元素(又一个数组)
my @db = qw(aaa bbb ccc ddd eee fff ggg);
my @in = qw(aaa fff);
my %h;

# Initialise the hash using a slice
@h{@in} = undef;
say 'Orig @db:'."@db";

@db = grep {not exists $h{$_}} @db;
say 'Mode @db:'."@db";

使用splice函数
Removing an element from the array using splice
splice will remove array element(s) by index
The splice function can totally eliminate the value from the array:
use v5.32;
use Data::Dumper qw(Dumper);
#remove 'Sleepy' from array
my @dwarfs = qw(Doc Grumpy Happy Sleepy Sneezy Dopey Bashful);
splice @dwarfs, 3, 1;
print Dumper \@dwarfs;

my $index=0;
$index++ until $dwarfs[$index] eq 'Sleepy'; #Or by while
#$index++ while $dwarfs[$index] ne 'Sleepy';
splice(@dwarfs, $index, 1);

使用正则匹配
$"=',';
my $expcolor='Green';
my @array=qw(Red Blue Green Yellow Black);
@array=grep {!/$expcolor/} @array;
print "@array";

使用grep反向匹配
$"=',';
my $expcolor='Green';
my @array=qw(Red Blue Green Yellow Black);
@array=grep {$_ ne $expcolor} @array;
print "@array";

使用正则精确匹配
$"=',';
my $expcolor='Green';
my @array = qw(Red Blue Greenish-Blue Green Bluish-Green Yellow Black);
@array = grep {!/^$expcolor$/} @array;
print "@array";

使用grep的效率要比正则好一些，下面来压测看一下结果

perl -le '@ar=(1 .. 20);@x=(8,10,3,17);$x=join("|",@x);@ar=grep{!/^(?:$x)$/o} @ar;$"=',';print "@ar"'

use Benchmark;
my @A=qw(A B C A D E A F G H A I J K L A M N);
my (@M1,@G1,@M2,@G2);

timethese(1_000_000,{
   'map1' => sub {
       my $i=0;
        @M1 = map { $i++; $_ eq 'A' ? $i-1 : ();} @A;
   },
    'map2' => sub {
        my $i=0;
       @M2 = map { $A[$_] eq 'A' ? $_ : () ;} 0..$#A;
    },
    'grep' => sub {
       @G1 = grep { $A[$_] eq 'A' } 0..$#A;
    },
    'grem' => sub {
       @G2 = grep { $_ ne 'A' } @A;
    },
});

Benchmark: timing 1000000 iterations of grem, grep, map1, map2...
      grem: 3 wallclock secs ( 2.49 usr + 0.00 sys = 2.49 CPU) @ 401606.43/s (n=1000000)
      grep: 2 wallclock secs ( 2.39 usr + 0.00 sys = 2.39 CPU) @ 418410.04/s (n=1000000)
      map1: 2 wallclock secs ( 3.03 usr + 0.00 sys = 3.03 CPU) @ 330033.00/s (n=1000000)
      map2: 3 wallclock secs ( 2.76 usr + 0.00 sys = 2.76 CPU) @ 362318.84/s (n=1000000)

还是grep的成绩要优异一些。

保留或移除hash中的n个键值对

use v5.20;
use Data::Dumper;

my ($n,%oh,%nh)=(2,('a',1,'b',2,'c',3,'d',4,'e',5));

#%nh=map { each %oh } 0..$n;
#%oh=map { each %oh } 0..$n;
%nh=(%oh)[0..$n*2-1];   #非map转换要注意取数的下标为奇数

say Data::Dumper->Dump([\%oh,\%nh],[qw/*oh *nh/]);

-----
Perl在输出数组元素时带上引号(double-quoting,single-quoting)
output each perl array element sourrounded in quotes.

my @array=qw();
#my @array=qw(Red Blue Greenish-Blue Green Bluish-Green Yellow Black);
say join ',',map qq("$_"),@array;
say @array?join ',',map{ qq!"$_"! } @array:'';
say join ',',map {qq("$_")} grep{$_} ('a', 'b', '', 'd', '', 'f');

my $bush = "The FreeOA site."
print qq/As say once: "$bush"\n/;

my @letters = ('a', 'b', 'c');
{local $" = ","; say "@letters"; } # a,b,c

say "2+5=@{[2+5]}"; # 2+5=7 (avoid this)
say "2+5=${\(2+5)}"; # 2+5=7 (avoid this)

使用B::Deparse这个核心模块来看看perl内部是如何重写处理的
See how Perl rewrites your interpolated strings using the -q option of the core B::Deparse module:
perl -MO=Deparse,-q -e 'print "@array"'
>print join($", @array);

以中括号[]来对元素进行包含处理

示例1
use v5.20;
my @array=qw(Doc Grumpy Happy Sleepy Sneezy Dopey Bashful);
my $str = sprintf '[%s]' x @array, @array;
say $str;

[Doc][Grumpy][Happy][Sleepy][Sneezy][Dopey][Bashful]

示例2
local $" = "][";
my @array = qw/Freeoa Grumpy Happy Sleepy Dopey Bashful/;
print "[@array]";

[Freeoa][Grumpy][Happy][Sleepy][Dopey][Bashful]

示例3
local $" = '';
my @array = qw/Freeoa Grumpy Happy Sleepy Dopey Bashful/;
say "[" . join("][", @array) . "]";
say qq|@{[map "[$_]",@array]}|;

[Freeoa][Grumpy][Happy][Sleepy][Dopey][Bashful]

-----
细说分片(slice)

在perl如果想要取得一部分变量、一部分列表内容、一部分hash内容，可以采用分片(切片)的方式；perl虽未提供字符串的切片方式，但可以使用内置函数substr()来实现一样的功能。

空变量赋值

例如有些语言(如golang)支持空变量赋值，以便丢弃那些不准备使用的变量，perl也支持，只需在不想使用的位置上设置undef即可。例如下面的变量列表中，就丢弃了php和ruby对应的赋值操作：
@arr=qw(python perl shell php ruby);
($py,$perl,$shell,undef,undef) = @arr;

perl中有些函数(比如stat和localtime)在列表上下文会返回很多个字段值列表，这时空变量赋值的方式就排上用场了：
($sec,$min,$hour,$mday,$mon,$year,undef,undef,undef) = localtime();

但是这样的赋值方式还是麻烦，弄错了undef的位置和数量，就会出错，而且有些时候只是想取得值即刻使用，而不想将其赋值给变量存储起来再通过变量来引用。于是，切片就排上用场了。

数组切片

先看列表切片：
qw(aaa bbb ccc ddd)[1,2];

这表示将列表(aaa bbb ccc ddd)进行切片，取出其中索引位为1和2的元素，由于索引位从0开始计算，所以表示取出(bbb ccc)。

1.切片返回的是一个列表，所以可以方便地对取出的元素赋值
2.切片中括号中的索引值只要不越界，可以随意写，且可以重复
3.切片的索引中需要的一个列表
4."-1"索引位表示从后向前取的倒数第1个元素，同理"-2"表示倒数第2个
5.中括号中的逗号不是表示范围，而是索引位分隔符

例如下面的例子中多次取了索引位1和2的元素，且索引位完全乱序的，但这些行为都是允许的：
qw(aaa bbb ccc ddd perl shell python)[1,-1,3,2,0,1,2];

由于索引位是列表，所以使用范围序列的方式也是允许的：
qw(aaa bbb ccc ddd perl shell python)[1..3]; # 等价于 [1,2,3]

再看数组切片。所谓数组切片，实际上是将数组转换为列表(数组底层就是列表)，再通过列表的有序性来切片。例如：
@arr = qw(a b c d);
($b,$d) = @arr[1,3];
print $b,$d;

多数时候，数组切片和列表切片是等价的，但是有两点不同：
1.数组切片可以放在双引号中被解析，从而进行数组的切片替换，而列表切片则不能解析
2.可以将一系列值赋值给数组切片(也就是切片表达式在等号左边)，从而实现修改数组元素的目的

第1点的示例如下：
@arr=qw(perl python shell php);
print "@arr[1,2,3]\n";   # 成功切片
print qw(aaa bbb ccc ddd)[1,2],"\n";   # 成功切片
print "qw(aaa bbb ccc ddd)[1,2]\n";   # 不会切片，而是直接当字符串输出

第2点示例如下：
@arr=qw(perl python shell php);
@arr[1,2]=qq(cpython csh);   # 将数组的元素python改为cpython，shell改为csh
print "new arr: @arr\n";

范围切片时使用M..N的方式，如果想要切到倒数第2个元素呢？指定N为-2吗？肯定不是这样的。所以如果想切到倒数第某个元素，可以使用($#arr-N+1)的方式来表示倒数第N个，例如5个元素的数组，$#arr为4，倒数第1个为$#arr - 0，倒数第二个为$#arr - 1。示例：
@arr=qw(perl python shell php);
print @arr[0..($#arr-2)];

Hash切片

hash切片和数组切片行为上类似，但写法上可能有些令人疑惑。例如：
%phone_num=(a=>"180",b=>"170",c=>"160",d=>"150");
($a,$b,$c)=@phone_num{qw(a b c)};
print $a,"\n",$b,"\n",$b,"\n";

几个需要说明的地方：
1.尽管是hash切片，切片使用的符号仍然是@，例如@hash_name
2.切片过程使用大括号包围想要取得的hash键列表
3.切片的索引是hash键，而非从0开始计算的数值索引位
4.取多个切片元素时，大括号中的hash键是一个由键组成的列表
5.可以将一系列值赋值给hash切片(也就是切片表达式在等号左边)，从而实现修改hash元素值的目的
6.hash是不能在双引号中进行替换的，但是hash切片可以在双引号中替换

以下三种hash键形式都是允许的：
@phone_num{qw(a b c)};
@phone_num{("a","b","c")};
@phone_num{"a","b","c"};

和数组切片可以赋值一样，也可以为hash的切片元素赋值，从而实现修改对应键值对的值。
%phone_num=(a=>"180",b=>"170",c=>"160",d=>"150");

@number=qw(181 171);
@phone_num{qw/a b/} = (@number);
print "@phone_num{qw/a b c/}","\n";

-----

-----