Perl IO::Compress 系列模块
2017-01-18 19:36:51 阿炯

它们同属于IO系列模块,侧重于压缩,Perl的世界里有许多压缩(解)模块,以对应于现行的多种压缩工具及算法。包括zip、gzip(rfc 1952)、bzip2、lzma(7z)、xz、lzf、lzop、lz4、lzo、lzw、snappy、deflate(rfc 1950)、rawdeflate(rfc 1951)、zstd。具体可以查看:Perl压缩模块介绍

本文重点介绍其核心内置的一些模块的使用,当然它们也都有各自的特点,核心内置的有:IO::Compress::Bzip2、IO::Compress::Deflate、IO::Compress::Gzip、IO::Compress::RawDeflate、IO::Compress::Zip、IO::Compress::Zstd;这里选用使用最为广泛的Gzip模块介绍。

It provides a Perl interface that allows writing compressed data to files or buffer as defined in RFC 1952.All the gzip headers defined in RFC 1952 can be created using this module.

最简单的使用

use v5.12;
use IO::Compress::Gzip qw(gzip $GzipError);
my $file='freeoa.sql';

my $output = "$file.gz" ;
gzip $file=>$output or die "Error compressing '$file': $GzipError\n";

gzip $input_filename_or_reference => $output_filename_or_reference [,OPTS] or die "gzip failed: $GzipError\n";

use v5.12;
use IO::Compress::Gzip qw(:all);

my $file='freeoa.sql';
say "gzip file:$file";
my $output = "$file.gz" ;
#指定一些参数
gzip $file=>$output,AutoClose=>0,-Level=>3,Comment =>'FreeOA perl gzip example comment.' or die "Error compressing '$file': $GzipError\n";

过程式的调用gzip指令开始对文件压缩。

输入或输出的文件名称可以是这些:

A filename

If the <$input_filename_or_reference> parameter is a simple scalar, it is assumed to be a filename. This file will be opened for reading and the input data will be read from it.

这是最简单的命名方式了,直接的标量。

A filehandle

If the $input_filename_or_reference parameter is a filehandle, the input data will be read from it. The string '-' can be used as an alias for standard input.

文件句柄方式,当然可以使用'-'来表示从标准的输入来读入文件。

A scalar reference

If $input_filename_or_reference is a scalar reference, the input data will be read from $$input_filename_or_reference .

从标量的引用中读取文件。

An array reference

If $input_filename_or_reference is an array reference, each element in the array must be a filename.

The input data will be read from each file in turn.The complete array will be walked to ensure that it only contains valid filenames before any data is compressed.

数组引用,即将要读入的有效文件放入数组中,依次遍历读取处理。

An Input FileGlob string

If $input_filename_or_reference is a string that is delimited by the characters "<" and ">" gzip will assume that it is an input fileglob string. The input is the list of files that match the fileglob.See File::GlobMapper for more details.

文件名匹配,简单的正则处理批量文件,glob方式。下面会有相应的示例。

可选的参数

还记得上面的[OPTS]吗,这里为我们传入相应的处理参数

AutoClose => 0|1

This option applies to any input or output data streams to gzip that are filehandles.If AutoClose is specified, and the value is true, it will result in all input and/or output filehandles being closed once gzip has completed.

This parameter defaults to 0.

自动关闭功能,默认不打开。其作用为一旦gzip功能完成,但立即关闭相应的句柄。

BinModeIn => 0|1

When reading from a file or filehandle, set binmode before reading.

Defaults to 0.

是否开启二进制模式,默认不开启。

Append => 0|1

The behaviour of this option is dependent on the type of output data stream.

    A Buffer
    If Append is enabled, all compressed data will be append to the end of the output buffer. Otherwise the output buffer will be cleared before any compressed data is written to it.
    
    A Filename
    If Append is enabled, the file will be opened in append mode. Otherwise the contents of the file, if any, will be truncated before any compressed data is written to it.
    
    A Filehandle
    If Append is enabled, the filehandle will be positioned to the end of the file via a call to seek before any compressed data is written to it. Otherwise the file pointer will not be moved.

When Append is specified, and set to true, it will append all compressed data to the output data stream.

So when the output is a filehandle it will carry out a seek to the eof before writing any compressed data. If the output is a filename, it will be opened for appending. If the output is a buffer, all compressed data will be appended to the existing buffer.

Conversely when Append is not specified, or it is present and is set to false, it will operate as follows.

When the output is a filename, it will truncate the contents of the file before writing any compressed data. If the output is a filehandle its position will not be changed. If the output is a buffer, it will be wiped before any compressed data is output.

Defaults to 0.

这个参数的作用与输出的数据流的类型相关,这里大致有三种:

缓冲
如果开启此功能,将会追加到缓冲区中,否则会清空之前存在于缓冲区的内容。

文件
这个好理解一些,如果开启就追加到文件后面,否则截断文件重写。

文件句柄
如果开启此功能,会调用一次seek,寻址到已有文件的尾部在将压缩数据流写入,否则文件指针不会移动。

默认不开启。

应用示例

将file1.txt文件压缩为file1.txt.gz。

use IO::Compress::Gzip qw(gzip $GzipError) ;
my $input = "file1.txt";
gzip $input => "$input.gz" or die "gzip failed: $GzipError\n";

从打开的文件句柄中读取内容到压缩缓冲区

use IO::Compress::Gzip qw(gzip $GzipError);
use IO::File;
my $input = new IO::File "<file1.txt" or die "Cannot open 'file1.txt': $!\n";
my $buffer;
gzip $input => \$buffer or die "gzip failed: $GzipError\n";

简单的指匹配压缩处理

use IO::Compress::Gzip qw(gzip $GzipError) ;
gzip '</my/home/*.txt>' => '<*.gz>' or die "gzip failed: $GzipError\n";

压缩后的文件放置于同一目录下。

for my $input ( glob "/my/home/*.txt" ){
my $output = "$input.gz" ;
gzip $input => $output
or die "Error compressing '$input': $GzipError\n";
}

对象式的调用

要比过程式的调用多了几个可选项,Merge、-Level、-Strategy、Minimal、Comment、Name、Time、TextFlag、HeaderCRC、OS_Code、ExtraField、ExtraFlags、Strict。这里涉及的知识比较深也多,就说说最常用的吧。

Merge => 0|1

This option is used to compress input data and append it to an existing compressed data stream in $output . The end result is a single compressed data stream stored in $output .

It is a fatal error to attempt to use this option when $output is not an RFC 1952 data stream.There are a number of other limitations with the Merge option:

1、This module needs to have been built with zlib 1.2.1 or better to work. A fatal error will be thrown if Merge is used with an older version of zlib.

2、If $output is a file or a filehandle, it must be seekable.

This parameter defaults to 0.

顾名思义,用于合并压缩数据流的,对压缩算法的版本有一定的要求,默认不开启。

-Level

Defines the compression level used by zlib. The value should either be a number between 0 and 9 (0 means no compression and 9 is maximum compression), or one of the symbolic constants defined below.
    Z_NO_COMPRESSION
    Z_BEST_SPEED
    Z_BEST_COMPRESSION
    Z_DEFAULT_COMPRESSION

The default is Z_DEFAULT_COMPRESSION.

Note, these constants are not imported by IO::Compress::Gzip by default.
    use IO::Compress::Gzip qw(:strategy);
    use IO::Compress::Gzip qw(:constants);
    use IO::Compress::Gzip qw(:all);

用于指定文件压缩时的等级,这个跟系统指令的说明文件中的相同,当时有4个常用的别名指定;这个默认没有导入到模块中,需要用上述三个导入指令中的一个导入即可使用。

-Strategy

Defines the strategy used to tune the compression. Use one of the symbolic constants defined below.
    Z_FILTERED
    Z_HUFFMAN_ONLY
    Z_RLE
    Z_FIXED
    Z_DEFAULT_STRATEGY

The default is Z_DEFAULT_STRATEGY.

指定压缩策略算法,当然是那些指定的策略别名了。

Comment => $comment

Stores the contents of $comment in the COMMENT field in the gzip header. By default, no comment field is written to the gzip file.

If the -Strict option is enabled, the comment can only consist of ISO 8859-1 characters plus line feed.

If the -Strict option is disabled, the comment field can contain any character except NULL. If any null characters are present, the field will be truncated at the first NULL.

设定压缩文件的注释,在-Strict开启后不能用多字节编码的文字。

API

print

$z->print($data)
print $z $data

Compresses and outputs the contents of the $data parameter. This has the same behaviour as the print built-in.

Returns true if successful.

将$data中的内容输出到压缩文件中。

printf

$z->printf($format, $data)
printf $z $format, $data

Compresses and outputs the contents of the $data parameter.

Returns true if successful.

同上,加上了格式化。

syswrite

$z->syswrite $data
$z->syswrite $data, $length
$z->syswrite $data, $length, $offset

Compresses and outputs the contents of the $data parameter.

Returns the number of uncompressed bytes written, or undef if unsuccessful.

将$data中的内容写入到压缩文件中,可对$data中的内容进行截取,成功写入的话返回写入的量。

write

$z->write $data
$z->write $data, $length
$z->write $data, $length, $offset

Compresses and outputs the contents of the $data parameter.

Returns the number of uncompressed bytes written, or undef if unsuccessful.

功能同上,原以为这个api与内置的write一样,但实际情况不是这样。

flush

$z->flush;
$z->flush($flush_type);

Flushes any pending compressed data to the output file/buffer.

This method takes an optional parameter, $flush_type , that controls how the flushing will be carried out. By default the $flush_type used is Z_FINISH . Other valid values for $flush_type are Z_NO_FLUSH , Z_SYNC_FLUSH , Z_FULL_FLUSH and Z_BLOCK . It is strongly recommended that you only set the flush_type parameter if you fully understand the implications of what it does - overuse of flush can seriously degrade the level of compression achieved. See the zlib documentation for details.

Returns true on success.

用于将预压缩的内容写入到文件或缓冲中,当然它在调用时有几种刷的方式。

tell

$z->tell()
tell $z

Returns the uncompressed file offset.

给出文件处理在哪里了。

eof

$z->eof();
eof($z);

Returns true if the close method has been called.

文件流是否到底了,在close方法调用后返回为True。

seek

$z->seek($position, $whence);
seek($z, $position, $whence);

Provides a sub-set of the seek functionality, with the restriction that it is only legal to seek forward in the output file/buffer. It is a fatal error to attempt to seek backward.

Empty parts of the file/buffer will have NULL (0x00) bytes written to them.

The $whence parameter takes one the usual values, namely SEEK_SET, SEEK_CUR or SEEK_END.

Returns 1 on success, 0 on failure.

文件流指定内部寻址,不过只能向前。

binmode

$z->binmode
binmode $z ;

This is a noop provided for completeness.

二进制模式,提供完整性。

opened

$z->opened()

Returns true if the object currently refers to a opened file/buffer.

判断文件是否被打开。

autoflush

my $prev = $z->autoflush()
my $prev = $z->autoflush(EXPR)

If the $z object is associated with a file or a filehandle, this method returns the current autoflush setting for the underlying filehandle. If EXPR is present, and is non-zero, it will enable flushing after every write/print operation.

If $z is associated with a buffer, this method has no effect and always returns undef.

Note that the special variable $| cannot be used to set or retrieve the autoflush setting.

自动刷新相关的设定。

fileno

$z->fileno()
fileno($z)

If the $z object is associated with a file or a filehandle, fileno will return the underlying file descriptor. Once the close method is called fileno will return undef.

If the $z object is associated with a buffer, this method will return undef.

如果$z关联到一个文件或文件句柄,fileno将返回底层文件描述符。一旦调用close方法则fileno将返回undef。

close

$z->close() ;
close $z ;

Flushes any pending compressed data and then closes the output file/buffer.

Therefore, if you want your scripts to be able to run on all versions of Perl, you should call close explicitly and not rely on automatic closing.

Returns true on success, otherwise 0.

If the AutoClose option has been enabled when the IO::Compress::Gzip object was created, and the object is associated with a file, the underlying file will also be closed.

关闭压缩文件句柄,将预压缩数据写入。

newStream([OPTS])

$z->newStream( [OPTS] )

Closes the current compressed data stream and starts a new one.OPTS consists of any of the options that are available when creating the $z object.

$z->newStream(Name => $file, Method => ZIP_CM_STORE);

关闭当前压缩数据流并开启一个新对象。

包导入指令

Importing

A number of symbolic constants are required by some methods in IO::Compress::Gzip . None are imported by default.

:all

Imports gzip , $GzipError and all symbolic constants that can be used by IO::Compress::Gzip . Same as doing this
use IO::Compress::Gzip qw(gzip $GzipError :constants) ;

:constants

Import all symbolic constants. Same as doing this
use IO::Compress::Gzip qw(:flush :level :strategy) ;

:flush

These symbolic constants are used by the flush method.
Z_NO_FLUSH
Z_PARTIAL_FLUSH
Z_SYNC_FLUSH
Z_FULL_FLUSH
Z_FINISH
Z_BLOCK

:level

These symbolic constants are used by the Level option in the constructor.
Z_NO_COMPRESSION
Z_BEST_SPEED
Z_BEST_COMPRESSION
Z_DEFAULT_COMPRESSION

:strategy

These symbolic constants are used by the Strategy option in the constructor.
Z_FILTERED
Z_HUFFMAN_ONLY
Z_RLE
Z_FIXED
Z_DEFAULT_STRATEGY

综合示例

use v5.12;
use Data::Dumper;
use IO::Compress::Gzip qw(gzip $GzipError :constants);

my $op='per.com.gz';

my $z = new IO::Compress::Gzip $op,-Level=>2 or die "gzip failed: $GzipError\n";

my $str='I am a SA in 33 old,just have 9 years experise and i had an rand no:'.getrand()."\n";
my $cnstr='网站开通于2009年3月,起初是想做一个关于办公自动化(OA)方面的站点,后来因时间及精力方面的问题,对开源软件的介绍及使用更多一些。专注于对linux相关资讯的专题站点,尤其是对此领域内的开源软件的应用,将会花更多的篇幅进行介绍引进,介绍一些基于开源的项目,提供一些基于此的使用经验和解决方案。';
my ($name,$login,$office,$uid,$gid,$home)=('hto','zhen hto','Md China',1000,33,'/home/hto');
my $forstr="Your ass is a piece of shit:45!\n";

$z->print($str);
$z->printf("And i got socre:%.2f\n",getrand());
$z->write($forstr,3,5);
say 'File offset:'.$z->tell();
say 'Gz file is opened:'.($z->opened()?'Yes':'No');
say 'My file descriptor:'.$z->fileno();
$z->syswrite($forstr,6,13);
my $swc=$z->syswrite($forstr);
$z->write($cnstr);
$z->print("\n");

#flush

say 'Before flush,is EOF:'.($z->eof()?'Yes':'No');

$z->flush();

say GzipError unless($z->close());

say 'After close,is EOF:'.($z->eof()?'Yes':'No');

printf "Use syswrite fun write %.3f bytes.\n",$swc;

sub getrand{
 return rand(99);
}

------------------------------------------------------------
use v5.12;
use IO::Compress::Gzip qw(gzip $GzipError);

my $filename='stperl.txt.gz';

open my $fh,'<stperl.txt';

my $z = IO::Compress::Gzip->new( $filename ) or die "Could not write to $filename: $GzipError";

while(<$fh>) {
 print { $z } $_;
}


------------------------------------------------------------
use v5.12;
use Data::Dumper;
use IO::Compress::Zstd qw(zstd $ZstdError);

my $op='per.mis.zst';

my $z = IO::Compress::Zstd->new($op) or die "IO::Compress::Zstd failed: $ZstdError";

my $str='I am a SA in 37 old,just have 13 years experise and i had an rand no:'.getrand()."\n";
my $cnstr='网站开通于2009年3月,起初是想做一个关于办公自动化(OA)方面的站点,后来因时间及精力方面的问题,对开源软件的介绍及使用更多一些。专注于对linux相关资讯的专题站点,尤其是对此领域内的开源软件的应用,将会花更多的篇幅进行介绍引进,介绍一些基于开源的项目,提供一些基于此的使用经验和解决方案。';
my ($name,$login,$office,$uid,$gid,$home)=('hto','zheng cc','Sc-Cd-Wh 100# China',1000,33,'/home/freeoa');
my $forstr="Your map is a piece AT $office!\n";

$z->print($str);
$z->printf("And i regot socre:%.2f\n",getrand());

$z->write($forstr,3,5);
$z->print("\n");

say 'File offset:'.$z->tell();
say 'Zstd file is opened:'.($z->opened()?'Yes':'No');
say 'My file descriptor:'.$z->fileno();

$z->syswrite($forstr,6,13);
$z->print("\n");

my $swc=$z->syswrite($forstr);
$z->print("\n");

$z->write($cnstr);
$z->print("\n");

#flush

say 'Before flush,is EOF:'.($z->eof()?'Yes':'No');

$z->flush();

say $ZstdError unless($z->close());

say 'After close,is EOF:'.($z->eof()?'Yes':'No');

printf "Use syswrite fun write %.3f bytes.\n",$swc;

sub getrand{
 return rand(99);
}

------------------------------------------------------------


参考来源

IO::Compress