Perl Dump数据类型结构-FreeOA

Perl Dump数据类型结构

2017-10-09 17:15:05

阿炯

本文总结了在Perl开发过程中，用于将其中的数据结构类型输出，主要用于调试功能的两个模块。

Data::Dump

Data::Dump - Pretty printing of data structures

This module provide a few functions that traverse their argument and produces a string as its result. The string contains Perl code that, when evaled, produces a deep copy of the original arguments.
这个模块提供了一些遍历它们的参数并产生一个字符串作为结果的函数。该字符串包含Perl代码，当evaled时，它会生成一个原始参数的深层副本。

有如下的一些函数导出(only the dd* functions are exported by default):

dump( ... )
pp( ... )

Returns a string containing a Perl expression. If you pass this string to Perl's built-in eval() function it should return a copy of the arguments you passed to dump().
返回包含Perl表达式的字符串。如果将此字符串传递给Perl的内置eval()函数，它会返回传递给dump()的参数的副本。

There is no difference between dump() and pp(), except that dump() shares its name with a not-so-useful perl builtin. Because of this some might want to avoid using that name.

quote($string)

Returns a quoted version of the provided string.
返回由双引号包围的字符串。

It differs from dump($string) in that it will quote even numbers and not try to come up with clever expressions that might shorten the output. If a non-scalar argument is provided then it's just stringified instead of traversed.

它与dump($string)不同，它会引用偶数，而不是试图用可能会缩短输出的看似聪明的表达式。如果提供一个非标量参数，那么它只是stringified而不是遍历。

dd( ... )
ddx( ... )

These functions will call dump() on their argument and print the result to STDOUT (actually, it's the currently selected output handle, but STDOUT is the default for that).
该函数还是会调用dump()来解析并输出到标准输出。

The difference between them is only that ddx() will prefix the lines it prints with "# " and mark the first line with the file and line number where it was called. This is meant to be useful for debug printouts of state within programs.

dumpf( ..., \&filter )

Short hand for calling the dump_filtered() function of Data::Dump::Filtered. This works like dump(), but the last argument should be a filter callback function. As objects are visited the filter callback is invoked and it can modify how the objects are dumped.

Data::Dumper

Data::Dumper - stringified perl data structures, suitable for both printing and eval

将perl变量中所包含的数据结构展现出来。该模块系内置模块，无需安装即可使用。

Given a list of scalars or reference variables, writes out their contents in perl syntax. The references can also be objects. The content of each variable is output in a single Perl statement. Handles self-referential structures correctly.

The return value can be evaled to get back an identical copy of the original reference structure.
返回值可以被evaled以得到原始引用结构的相同副本。

该模块只有一个导出函数
Dumper(LIST)

Returns the stringified form of the values in the list, subject to the configuration options below. The values will be named $VARn in the output, where n is a numeric suffix. Will return a list of strings in a list context.
返回列表中值的stringified形式，以下面的配置选项为准。值将在输出中命名为$VARn，其中n是数字后缀。将返回列表上下文中的字符串列表。

其有内置方法用于控制它的输出格式。

两者的比较

Data::Dump提供较多的函数使用，甚至连标量字符串都可以，且它的函数在输出上更加紧凑，更加接近Perl语法；返回值可直接用于变量使用。下面是一个从hbase中读取表结构并将其转换为创建语句的脚本。

use v5.12;
use utf8;
use Encode;
use JSON::XS;
use Mojo::Log;
use Time::Piece;
use Data::Dumper;
use Time::Seconds;
use Mojo::UserAgent;
use Data::Dump "pp";
use Cwd qw(abs_path realpath);
use File::Basename qw(dirname);

binmode(STDIN, ":encoding(utf8)");
binmode(STDOUT, ":encoding(utf8)");

#create at@2017-05-27 by freeoa

my $mydir=dirname(abs_path($0));
chdir($mydir);

my $cdt=localtime;
my $log = Mojo::Log->new(path => 'log/hbase.tab.log');
my $ua = Mojo::UserAgent->new;

$log = $log->format(sub {
my ($time, $level, @lines) = @_;
my $idt=localtime->datetime;
return qq{[$idt] [$level] @lines \n};
});

#Hbase hp for hostname & port
my ($hp,@hbtabs)=('192.168.20.107:8000');
#取得所有表的名称
my $json=$ua->get($hp=>{'Accept'=>'application/json'})->res->json;
push @hbtabs,$_->{name} foreach (@{$json->{table}});

foreach my $tab (@hbtabs){
   my $tabsch=$ua->get("$hp/$tab/schema/"=>{'Accept'=>'application/json'})->res->json;
   say Dumper($tabsch);
   my $creatabsch=qq[create '$tabsch->{name}', ];
   my $rs=getab_meta($tabsch->{ColumnSchema}[0]);
   $creatabsch.=$rs;
   say $creatabsch;
}

sub getab_meta{
   my $tm=shift;
   while(my ($k, $v) = each(%$tm)) {
       $tm->{uc($k)} = $v,delete $tm->{$k} if($k=~/[a-z]+/);
   }
   return pp($tm);
}

输出样例片段：

$VAR1 = {
          'IS_META' => 'false',
          'ColumnSchema' => [
                            {
                              'VERSIONS' => '1',
                              'BLOOMFILTER' => 'ROW',
                              'KEEP_DELETED_CELLS' => 'false',
                              'name' => 'content',
                              'DATA_BLOCK_ENCODING' => 'NONE',
                              'IN_MEMORY' => 'false',
                              'COMPRESSION' => 'GZ',
                              'BLOCKSIZE' => '65536',
                              'REPLICATION_SCOPE' => '0',
                              'TTL' => '2147483647',
                              'BLOCKCACHE' => 'true',
                              'MIN_VERSIONS' => '0'
                            }
                          ],
          'name' => 'freeoa_hbt'
        };

create 'freeoa_hbt', {
BLOCKCACHE => "true",
BLOCKSIZE => 65536,
BLOOMFILTER => "ROW",
COMPRESSION => "GZ",
DATA_BLOCK_ENCODING => "NONE",
IN_MEMORY => "false",
KEEP_DELETED_CELLS => "false",
MIN_VERSIONS => 0,
NAME => "content",
REPLICATION_SCOPE => 0,
TTL => 2147483647,
VERSIONS => 1,
}

参考来源：
Data::Dump

Data::Dumper

使用Data::Dumper模块调试Perl脚本

打印复杂数据结构：Data::Dumper,Data::Dump,Data::Printer

输出复杂结构

Data::Dumper、Data::Dump、Data::Printer都可以用来输出复杂的数据结构。前两者建议传递数据结构的引用给对应的函数、方法，当然直接传递非引用也不会错(标量、数组、哈希或引用都允许)。第三个Printer则可以自动判断是否是引用；Data::Dumper为内置的核心模块。

例如下面的数据结构，一个是复杂的hash，一个是相对简单的匿名数组引用，分别使用这3个模块来输出。
%Config = (
           'auto_commit' => '0',
           'build_dir' => '/home/freeoa/.cpan/build',
           'bzip2' => '/bin/bzip2',
           'urllist' => [
                          'http://cpan.metacpan.org/',
                          \@my_urllist     # 将数组my_urllist作为元素
                        ],
           'wget' => '/usr/bin/wget',
          );

@my_urllist=('http://mirrors.aliyun.com/CPAN/',
             'https://mirrors.tuna.tsinghua.edu.cn/CPAN/',
             'https://mirrors.163.com/cpan/',
             \@more_urllist       # 将数组more_urllist引用作为元素
            );

@more_urllist=qw(http://mirrors.shu.edu.cn/CPAN/
                 http://mirror.lzu.edu.cn/CPAN/
                );

$ref_arr=[qw(zheng good freeoa net)];

1.使用Data::Dumper的Dumper函数，期待的是引用

use Data::Dumper;
print Dumper(\%Config,$abc);

输出结果：
$VAR1 = {
          'wget' => '/usr/bin/wget',
          'urllist' => [
                         'http://cpan.metacpan.org/',
                         [
                           'http://mirrors.aliyun.com/CPAN/',
                           'https://mirrors.tuna.tsinghua.edu.cn/CPAN/',
                           'https://mirrors.163.com/cpan/',
                           [
                             'http://mirrors.shu.edu.cn/CPAN/',
                             'http://mirror.lzu.edu.cn/CPAN/'
                           ]
                         ]
                       ],
          'bzip2' => '/bin/bzip2',
          'auto_commit' => '0',
          'build_dir' => '/home/freeoa/.cpan/build'
};

$VAR2 = [
          'zheng',
          'good',
          'freeoa',
          'net'
];

注意，Dumper()将第一个引用赋值给$VAR1，第二个引用赋值给$VAR2。例如想要将默认的$VAR修改为自定义的变量名称，可以使用Data::Dumper->Dump方法。

2.使用Data::Dumper的Dump方法，期待两个数组引用，第二个数组引用用来定义现实的变量名，而不是默认的VAR。

use Data::Dumper;
print Data::Dumper->Dump([\%Config,$ref_arr],[qw(myvar myarr)]);

以下是输出结果：
$myvar = {
           'wget' => '/usr/bin/wget',
           'auto_commit' => '0',
           'bzip2' => '/bin/bzip2',
           'build_dir' => '/home/freeoa/.cpan/build',
           'urllist' => [
                          'http://cpan.metacpan.org/',
                          [
                            'http://mirrors.aliyun.com/CPAN/',
                            'https://mirrors.tuna.tsinghua.edu.cn/CPAN/',
                            'https://mirrors.163.com/cpan/',
                            [
                              'http://mirrors.shu.edu.cn/CPAN/',
                              'http://mirror.lzu.edu.cn/CPAN/'
                            ]
                          ]
                        ]
         };
$myarr = [
           'zheng',
           'good',
           'freeoa',
           'net'
         ];

注意上面用了两个数组引用，第一个数组引用是待输出的复杂数据结构，第二个数组引用是定义前一个数组引用的变量名称。

例如，下面的Dump方法，myvar定义\%Config的输出变量名称，myarr定义\@name1的输出变量名称，\@name2没有对应的变量名称，所以使用默认的$VAR3来输出。

print Data::Dumper->Dump([\%Config,\@name1,\@name2],[qw(myvar,myarr)]);

3.使用Data::Dump的dump方法，它输出时不会将输出结果赋值给标量变量，而是直接输出数据结构，有什么就输出什么。例如输出数组引用：
use Data::Dump qw(dump);
print dump($ref_arr);

输出结果：
["zheng", "good", "freeoa", "net"]

输出hash引用：print dump(\%Config);
{
auto_commit => 0,
build_dir => "/home/freeoa/.cpan/build",
bzip2 => "/bin/bzip2",
urllist => [
    "http://cpan.metacpan.org/",
    [
      "http://mirrors.aliyun.com/CPAN/",
      "https://mirrors.tuna.tsinghua.edu.cn/CPAN/",
      "https://mirrors.163.com/cpan/",
      [
        "http://mirrors.shu.edu.cn/CPAN/",
        "http://mirror.lzu.edu.cn/CPAN/",
      ],
    ],
],
wget => "/usr/bin/wget",
}

输出hash引用和匿名数组结果：print dump(\%Config,$ref_arr);
(
{
    auto_commit => 0,
    build_dir => "/home/freeoa/.cpan/build",
    bzip2 => "/bin/bzip2",
    urllist => [
      "http://cpan.metacpan.org/",
      [
        "http://mirrors.aliyun.com/CPAN/",
        "https://mirrors.tuna.tsinghua.edu.cn/CPAN/",
        "https://mirrors.163.com/cpan/",
        [
          "http://mirrors.shu.edu.cn/CPAN/",
          "http://mirror.lzu.edu.cn/CPAN/",
        ],
      ],
    ],
    wget => "/usr/bin/wget",
},
["zheng", "good", "freeoa", "net"],
)

4.使用Data::Printer的p函数，它会直接输出结果，无需额外的print或say

p函数可以直接传递数据对象
如果传递的是引用，则必须是引用变量，而不能是反斜线开头的引用
p函数不能同时格式化输出两个对象

p(%Config)   # 正确
p($ref_Config)   # 正确
p(\%Config)   # 错误
p($ref_arr,$ref_Config)   # 错误

直接传递数据对象：
use Data::Printer;
p(%Config)

以下是输出：
{
    auto_commit   0,
    build_dir     "/home/freeoa/.cpan/build",
    bzip2         "/bin/bzip2",
    urllist       [
        [0] "http://cpan.metacpan.org/",
        [1] [
            [0] "http://mirrors.aliyun.com/CPAN/",
            [1] "https://mirrors.tuna.tsinghua.edu.cn/CPAN/",
            [2] "https://mirrors.163.com/cpan/",
            [3] [
                [0] "http://mirrors.shu.edu.cn/CPAN/",
                [1] "http://mirror.lzu.edu.cn/CPAN/"
            ]
        ]
    ],
    wget          "/usr/bin/wget"
}

传递引用变量：
p($ref_arr);

以下是结果：
\ [
    [0] "zheng",
    [1] "good",
    [2] "freeoa",
    [3] "net"
]

让Dumper和eval结合

由于Data::Dumper以及Data::Dump的输出中会包含变量，所以如果将dump出的结果持久化保存到文本后，可以在读取时使用eval将其直接构建成新的数据结构。

print DATA Dumper(\%Config);

它将%Config的内容持久化到文件句柄DATA连接的文件中。当需要时读取它并解除引用：
open DATA, "<$datafile" or die "$!";
{
    local $/;
    %new_Config = %{ eval <DATA> };
}

上面的eval使得perl去编译读取到的DATA，因为DATA是由Dumper出去的数据，它们都是变量开头的，所以eval <DATA>编译读取的内容后先进行赋值，然后返回赋值完成的类似$VAR1变量，由于这个标量变量是在解除引用的结构中，所以将新构建一个hash对象。

但是上面的语句还有点问题，因为有时候持久化的文件可能会是空的，这时就会报错eval那里就会报错。为了健壮性，不得不加入更多的逻辑判断。比如下面先将DATA的内容当作字符串赋值给变量变量$dumped_hash，然后判断这个变量。

open DATA, "<$datafile" or die "$!";
my $dumped_hash;
{
    local $/;
    $dumped_hash = <DATA>;
}
my %new_Config = %{ eval $dumped_hash } if $dumped_hash;

但也有很另类的写法：
%new_Config = %{ +eval { <DATA> } };

用eval进行错误捕获，如果DATA不为空，则返回赋值后的变量$VAR1，前面加一个+得到+$VAR1，这个加号显式提示perl这是一个匿名hash，而不是一次性的语句块结构。然后解除引用。