Perl IO::Uncompress 系列模块-FreeOA

Perl IO::Uncompress 系列模块

2017-01-19 16:22:54

阿炯

本文与Perl IO::Compress 系列模块文章相对应，但内容是解压缩相关的，同样选择IO::Uncompress::Gunzip模块做为主要介绍对象，因为与其它模块在功能是大致相同的，同为内置核心模块的还有：
IO::Uncompress::AnyInflate - Uncompress zlib-based (zip, gzip) file/buffer
IO::Uncompress::AnyUncompress - Uncompress gzip, zip, bzip2 or lzop file/buffer
IO::Uncompress::Base - Base Class for IO::Uncompress modules
IO::Uncompress::Bunzip2 - Read bzip2 files/buffers
IO::Uncompress::Gunzip - Read RFC 1952 files/buffers
IO::Uncompress::Inflate - Read RFC 1950 files/buffers
IO::Uncompress::RawInflate - Read RFC 1951 files/buffers
IO::Uncompress::Unzip - Read zip files/buffers

This module provides a Perl interface that allows the reading of files/buffers that conform to RFC 1952.For writing RFC 1952 files/buffers, see the companion module IO::Compress::Gzip.

它在使用方式方法上与IO::Compress里的gzip相似，入门的使用可以参考上面提及的文章。

不过在可选参数上有些不同：

MultiStream => 0|1

If the input file/buffer contains multiple compressed data streams, this option will uncompress the whole lot as a single data stream.

Defaults to 0.

支持多个压缩文件流同时输入处理，默认不开启。

TrailingData => $scalar

Returns the data, if any, that is present immediately after the compressed data stream once uncompression is complete.

This option can be used when there is useful information immediately following the compressed data stream, and you don't know the length of the compressed data stream.

If the input is a buffer, trailingData will return everything from the end of the compressed data stream to the end of the buffer.

If the input is a filehandle, trailingData will return the data that is left in the filehandle input buffer once the end of the compressed data stream has been reached. You can then use the filehandle to read the rest of the input file.

Don't bother using trailingData if the input is a filename.

If you know the length of the compressed data stream before you start uncompressing, you can avoid having to use trailingData by setting the InputLength option.

尚不知道这个选项的意义。

use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;
gunzip $input_filename_or_reference => $output_filename_or_reference [,OPTS] or die "gunzip failed: $GunzipError\n";

示例

use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;
my $input = "file1.txt.gz";
my $output = "file1.txt";

gunzip $input => $output or die "gunzip failed: $GunzipError\n";

To uncompress all files in the directory "/my/home" that match "*.txt.gz" and store the compressed data in the same directory.

将/my/home/目录下所有的.txt.gz文件解压后置于同一目录下。

use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;
gunzip '</my/home/*.txt.gz>' => '</my/home/#1.txt>' or die "gunzip failed: $GunzipError\n";

use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;
for my $input ( glob "/my/home/*.txt.gz" ){
my $output = $input;
$output =~ s/.gz// ;
gunzip $input => $output or die "Error compressing '$input': $GunzipError\n";
}

对象式调用

my $z = new IO::Uncompress::Gunzip $input [OPTS] or die "IO::Uncompress::Gunzip failed: $GunzipError\n";

API

OPTS is a combination of the following options:

AutoClose => 0|1

This option is only valid when the $input parameter is a filehandle. If specified, and the value is true, it will result in the file being closed once either the close method is called or the IO::Uncompress::Gunzip object is destroyed.

This parameter defaults to 0.

在$input参数是一个文件句柄的情况才有效，为1的话在IO::Uncompress::Gunzip对象的close方法调用后销毁。

MultiStream => 0|1

Allows multiple concatenated compressed streams to be treated as a single compressed stream. Decompression will stop once either the end of the file/buffer is reached, an error is encountered (premature eof, corrupt compressed data) or the end of a stream is not immediately followed by the start of another stream.

This parameter defaults to 0.

同路输入文件处理。

Prime => $string

This option will uncompress the contents of $string before processing the input file/buffer.

This option can be useful when the compressed data is embedded in another file/data structure and it is not possible to work out where the compressed data begins without having to read the first few bytes. If this is the case, the uncompression can be primed with these bytes using this option.

Transparent => 0|1

If this option is set and the input file/buffer is not compressed data, the module will allow reading of it anyway.

In addition, if the input file/buffer does contain compressed data and there is non-compressed data immediately following it, setting this option will make this module treat the whole file/buffer as a single data stream.

This option defaults to 1.

BlockSize => $num

When reading the compressed input data, IO::Uncompress::Gunzip will read it in blocks of $num bytes.

This option defaults to 4096.

每次从压缩文件中的读取量，默认值与大多数文件系统的基本文件块相同，这当然是从效率上指定的。

InputLength => $size

When present this option will limit the number of compressed bytes read from the input file/buffer to $size . This option can be used in the situation where there is useful data directly after the compressed data stream and you know beforehand the exact length of the compressed data stream.

This option is mostly used when reading from a filehandle, in which case the file pointer will be left pointing to the first byte directly after the compressed data stream.

This option defaults to off.

Append => 0|1

This option controls what the read method does with uncompressed data.

If set to 1, all uncompressed data will be appended to the output parameter of the read method.

If set to 0, the contents of the output parameter of the read method will be overwritten by the uncompressed data.

Defaults to 0.

Strict => 0|1

This option controls whether the extra checks defined below are used when carrying out the decompression. When Strict is on, the extra tests are carried out, when Strict is off they are not.

The default for this option is off.

1、If the FHCRC bit is set in the gzip FLG header byte, the CRC16 bytes in the header must match the crc16 value of the gzip header actually read.

2、If the gzip header contains a name field (FNAME) it consists solely of ISO 8859-1 characters.

3、If the gzip header contains a comment field (FCOMMENT) it consists solely of ISO 8859-1 characters plus line-feed.

4、If the gzip FEXTRA header field is present it must conform to the sub-field structure as defined in RFC 1952.

5、The CRC32 and ISIZE trailer fields must be present.

6、The value of the CRC32 field read must match the crc32 value of the uncompressed data actually contained in the gzip file.

7、The value of the ISIZE fields read must match the length of the uncompressed data actually read from the file.

默认不开启。如果开启的话有许多地方要注意。

ParseExtra => 0|1

If the gzip FEXTRA header field is present and this option is set, it will force the module to check that it conforms to the sub-field structure as defined in RFC 1952.

If the Strict is on it will automatically enable this option.

Defaults to 0.

read

$status = $z->read($buffer, $length)
$status = $z->read($buffer, $length, $offset)
$status = read($z, $buffer, $length)
$status = read($z, $buffer, $length, $offset)

Attempt to read $length bytes of uncompressed data into $buffer .

The main difference between this form of the read method and the previous one, is that this one will attempt to return exactly $length bytes. The only circumstances that this function will not is if end-of-file or an IO error is encountered.

Returns the number of uncompressed bytes written to $buffer , zero if eof or a negative number on error.

尝试从未压缩的数据中读取指定长度字节的放入缓冲区变量$buffer中。

返回写入缓冲区变量$buffer未压缩字节，如果eof返回0，为负数时则为出错了。

getline

$line = $z->getline()
$line = <$z>

Reads a single line.

This method fully supports the use of the variable $/ (or $INPUT_RECORD_SEPARATOR or $RS when English is in use) to determine what constitutes an end of line. Paragraph mode, record mode and file slurp mode are all supported.

从压缩文件中读取一行，支持$/特殊文件处理符号，段落模式、记录模式和文件整体读入模式都支持。

getc

$char = $z->getc()

Read a single character.

从中读取一个字符。

ungetc

$char = $z->ungetc($string)

尚不知这个函数意义。

inflateSync

$status = $z->inflateSync()

getHeaderInfo

$hdr = $z->getHeaderInfo();
@hdrs = $z->getHeaderInfo();

This method returns either a hash reference (in scalar context) or a list or hash references (in array context) that contains information about each of the header fields in the compressed data stream(s).

Name
The contents of the Name header field, if present. If no name is present, the value will be undef. Note this is different from a zero length name, which will return an empty string.

Comment
The contents of the Comment header field, if present. If no comment is present, the value will be undef. Note this is different from a zero length comment, which will return an empty string.

返回压缩文件头的一系列特征属性。

tell

$z->tell()
tell $z

Returns the uncompressed file offset.

返回此时在文件中的位置指针。

eof

$z->eof();
eof($z);

Returns true if the end of the compressed input stream has been reached.

判断是否已经到文件结束尾部了。

seek

$z->seek($position, $whence);
seek($z, $position, $whence);

Provides a sub-set of the seek functionality, with the restriction that it is only legal to seek forward in the input file/buffer. It is a fatal error to attempt to seek backward.

Note that the implementation of seek in this module does not provide true random access to a compressed file/buffer. It works by uncompressing data from the current offset in the file/buffer until it reaches the uncompressed offset specified in the parameters to seek. For very small files this may be acceptable behaviour. For large files it may cause an unacceptable delay.

The $whence parameter takes one the usual values, namely SEEK_SET, SEEK_CUR or SEEK_END.

Returns 1 on success, 0 on failure.

用于指针寻址，同样只能相前。

binmode

$z->binmode
binmode $z ;

This is a noop provided for completeness.

是否开启二进制模式。

opened

$z->opened()

Returns true if the object currently refers to a opened file/buffer.

检测文件是否被打开。

autoflush

my $prev = $z->autoflush()
my $prev = $z->autoflush(EXPR)

If the $z object is associated with a file or a filehandle, this method returns the current autoflush setting for the underlying filehandle. If EXPR is present, and is non-zero, it will enable flushing after every write/print operation.

If $z is associated with a buffer, this method has no effect and always returns undef.

Note that the special variable $| cannot be used to set or retrieve the autoflush setting.

自动刷新相关的设置。特殊变量$|在此行不通。

input_line_number

$z->input_line_number()
$z->input_line_number(EXPR)

Returns the current uncompressed line number. If EXPR is present it has the effect of setting the line number. Note that setting the line number does not change the current position within the file/buffer being read.

The contents of $/ are used to determine what constitutes a line terminator.

返回当前指针所在的文件行，支持$/变量。

fileno

$z->fileno()
fileno($z)

If the $z object is associated with a file or a filehandle, fileno will return the underlying file descriptor. Once the close method is called fileno will return undef.

If the $z object is associated with a buffer, this method will return undef.

返回文件的描述符，在close方法调用前调用它。

close

$z->close() ;
close $z ;

Closes the output file/buffer.Returns true on success, otherwise 0.

If the AutoClose option has been enabled when the IO::Uncompress::Gunzip object was created, and the object is associated with a file, the underlying file will also be closed.

用于关闭文件句柄。

nextStream

my $status = $z->nextStream();

Skips to the next compressed data stream in the input file/buffer. If a new compressed data stream is found, the eof marker will be cleared and $. will be reset to 0.

Returns 1 if a new stream was found, 0 if none was found, and -1 if an error was encountered.

跳过一个打开文件的剩余部分，处理下一个压缩文件。

trailingData

my $data = $z->trailingData();

Returns the data, if any, that is present immediately after the compressed data stream once uncompression is complete. It only makes sense to call this method once the end of the compressed data stream has been encountered.

This option can be used when there is useful information immediately following the compressed data stream, and you don't know the length of the compressed data stream.If you know the length of the compressed data stream before you start uncompressing, you can avoid having to use trailingData by setting the InputLength option in the constructor.

函数导入

No symbolic constants are required by this IO::Uncompress::Gunzip at present.

:all

Imports gunzip and $GunzipError . Same as doing this
use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;

综合示例

use v5.12;
use Data::Dumper;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError) ;

my $infile='per.com.gz';
my ($length,$buffer)=(8);

my $z = new IO::Uncompress::Gunzip $infile or die "IO::Uncompress::Gunzip failed: $GunzipError\n";

my $status = $z->read($buffer, $length);

say Dumper($status,$buffer);

{
local $/='as';
say 'One line here:'.$z->getline();
};

say 'One single char:'.$z->getc();

say 'HeadInfo:'.Dumper($z->getHeaderInfo());

say 'In line no:'.$z->input_line_number();

say 'Another line:'.$z->getline();

$z->close() if($z->eof());

-------------------------------

use v5.12;
use IO::Uncompress::Gunzip qw(gunzip $GunzipError);

my $filename='per.com.gz';

my $z = IO::Uncompress::Gunzip->new( $filename ) or die "Could not read from $filename: $GunzipError";

while( <$z> ) {
print;
}

-------------------------------

#!/usr/bin/perl
use v5.12;
use IO::Uncompress::Gunzip qw($GunzipError);

my $z = IO::Uncompress::Gunzip->new( *STDIN, MultiStream => 1 ) or die "Could not make uncompress object: $GunzipError";

while( <$z> ) {
print;
}

The MultiStream option in IO::Compress::Gunzip allows the decompressor to reset itself when it thinks it has detected a new stream and continue to provide output:

cat *.gz | freeoa.pl

参考来源

IO::Uncompress