Perl IO系列文件操作模块概览
2013-04-27 17:25:42

Perl traditionally does IO using filehandles, but these have a number of problems, least of all that they can not be treaded like normal variables and passing them to functions can be difficult. Fortunately, this being perl, there is an alternative and that is IO::Handle objects.

相对于传统的i/o操作,IO系列操作模块看起来要统一、规范不少,同时它也是核心模块系列,推荐在实际中使用。注意:IO这个模块已经不推荐使用了,取而代之的是其下相关的功能模块(IO::Handle、IO::Seekable、IO::File、IO::Pipe、IO::Socket、IO::Dir、IO::Select、IO::Poll)。

Replacing Filehandles with IO::Handles

Normally when you open a file for reading you would do somethig similar to:
open FILE, "<$filename or die "could not open $filename: $!\n";
while (my $line = <FILE>) {
 print $line;
}
close FILE;

We can replace that with IO::File:
use IO::File;
my $fh = new IO::File($filename, "r") or die "could not open $filename: $!\n";
while (my $line = $fh->getline()) {
 print $line;
}
$fh->close();

This simple example doesn't really show you why you'd want to use IO::Handle based IO.

Use Perl IO::File to Open a File Handle

IO::File is a perl standard CPAN module which is used for opening a file handle in other colourful conventions.

use IO::File;
$read_fh = IO::File->new("/tmp/msg",'r');
read_text($read_fh);

sub read_text{
 local $read_fh = shift;
 my @lines;
 @lines = <$read_fh>;
 print @lines;
}

Following perl code snippet explains perl write operation with IO::File module.
$write_fh = IO::File->new("/tmp/msg",'w');

To open the file handler in append mode, do the following.
$fh = IO::File->new("/tmp/msg",O_WRONLY|O_APPEND);

Normal Object

The biggest win over normal file handles is that IO::Handles are normal perl objects and can be passed to functions like you'd pass any ordinary object. Doing the same with file handles is cumbersome at best and a nightmare the rest of the time.
sub get_contents {
my $in = shift;
my $ret = ""
while (<$in>) {
 $ret.= $_;
}
 return $ret;
}

open FILE, "<$filename";
my $contents = get_contents(*FILE{IO});

Using several tricks we can get the above code, which is nearly usual perl, but notice the odd parameter passed to the function. This is because you can't pass file handles, so you have to pass a reference to every thing called FILE, hence the *. The {IO} then restricts this to just the file handle. More information is available from stonehenge.com.

It turns out that thanks to the fact that you can use IO::Handles between <>brackets, we don't need to rewrite the get_contents function. We can just write:
my $fh = new IO::File($filename, "r");
my $contents = get_contents($fh);

I don't know about you, but I find this is much nicer.

Handle Agnostic

All handles inherit from IO::Handle and will implement all the functions that it defines. This means you can use the same code regardless of whether you have a socket, a file or a pipe.

As an example, lets write a program which outputs 100 random letters to a socket, a pipe or a file depending on the arguments. We start with the same 3 lines all programs should start with.

Next we want to import the IO modules we will use
use IO::File;
use IO::Socket::INET;
use IO::Pipe;

Next, let's write a function which output's 100 random characters to a file handle that is passed as a parameter.
sub output_random_letters {
 my $out = shift;
 my $line = "";
 for (0..100) {
  $line .= chr(int(rand(26))+65);
 }
 $out->print($line);
 $out->flush();
}

So now, lets create our some file handles:
my $fh;

if ($ARGV[0] eq "file") {
   $fh = new IO::File($ARGV[1], "w");
} elsif ($ARGV[0] eq "pipe") {
   $fh = new IO::Pipe;
   $fh->writer($ARGV[1]);
} elsif ($ARGV[0] eq "socket") {
   $fh = IO::Socket::INET->new(PeerAddr => $ARGV[1],
    PeerPort => $ARGV[2],
    Proto => 'tcp');
} else {
   die "Unknown filehandle type: $ARGV[0]\n";
}

Now we can call our function and clean up.
output_random_letters($fh);
$fh->close();
exit 0;

Now, lets test this.

File

perl /tmp/io.pl file /tmp/output && cat /tmp/output

Pipe
perl /tmp/io.pl pipe less

Socket

nc -l -p 12345 &

[1] 12201

perl /tmp/io.pl socket localhost 12345

[1]+  Done                    nc -l -p 12345

Other Useful IO modules
IO::AtomicFile

This class is like IO::File, except it doesn't overwrite the file you are writing to until you close the handle. This gives you atomic file writes so you don't have to worry about your program dying in the middle of writing the file. It does this by actually writing to a temporary file and then calling rename() when you close the file handle. You can throw away your updates by calling $filehandle->delete().
my $passwd = IO::File("/etc/passwd", "w");
my $passwd = IO::AtomicFile("/etc/passwd", "w");

IO::Digest

This module hooks into the perlio layer to allow you to compute hash digests of your file as you read and write it, saving you from having to reread the file to calculate this. This can be particularly useful when you pass your handle to a Module to parse and you don't want to or can't reread the data (say when you are reading from the network). Basically, you create an IO::Digest object, register a IO::Handle object with it and then use tht handle. When you are done you can call IO::Digest->hexdigest() function to get the hash.
use IO::Digest;

my $fh = new IO::file($filename, "r");
my $iod = new IO::Digest($fh, 'MD5');
read_and_parse($fh);
print $iod->hexdigest

IO::Zlib

This module gives you a IO::File api to read and write gzipped files. You create your handle and then use it as if it was a normal file.
$fh = new IO::Zlib("file.gz", "rb") or die "Could not open file.gz: $!\n";
while my $line = $fh->getline()) {
   print $line;
}
$fh->close;

You may want to investigate IO::Uncompress::AnyUncompress too.


sysFIO(sysopen,sysread,sysseek)系统底层函数与open,read函数相比是不带缓冲,前者支持祼字的打开方式,同时也支持非阻塞IO、权限掩码;IO::Module系统视对其的调用参数的情况决定其后端使用open还是sysopen系的函数封装。官方也警告不要将两者混用,会导致不可预知的后果,但两者都各有对方没有的特点,视场景来决定使用哪套体系。不带sys前缀的函数系perl官方支持的,多数情况下用它应该多一些。

The sysopen function takes three or four arguments: filehandle, filename, mode, and an optional permissions value. The mode is a number constructed from constants provided by the Fcntl module:
use Fcntl;

sysopen(SOURCE, $path, O_RDONLY) or die "Couldn't open $path for reading: $!\n";

The IO::File module’s new method accepts both open and sysopen style arguments and returns an anonymous filehandle. The new method also accepts a mode in the style of fopen(3):
use IO::File;

# like Perl's open
$fh = IO::File->new("> $filename") or die "Couldn't open $filename for writing: $!\n";

# like Perl's sysopen
$fh = IO::File->new($filename, O_WRONLY|O_CREAT) or die "Couldn't open $filename for writing: $!\n";

# like stdio's fopen(3)
$fh = IO::File->new($filename, "r+") or die "Couldn't open $filename for read and write: $!\n";


文件句柄指针位置之FileHandle::getpos/setpos

FileHandle的文档中说其为IO::*类的前端。

If the C functions fgetpos and fsetpos are available, then FileHandle::getpos returns an opaque value that represents the current position of the FileHandle, and FileHandle::setpos uses that value to return to a previously visited position.

Opaque means you shouldn't pay attention the value: use it only as a parameter in future requests from the module.

freeoa.pl:
use v5.12;
my $fn=shift @ARGV;
die 'File:'.$fn.' Not Found...' unless (-e $fn);
open my $fh, "<", $fn or die "$fn: open: $!";
seek $fh,5,0;
print tell($fh),"\n";
print getc($fh),"\n";

END{
    close($fh) if defined($fh);
}

file: z3.txt
text139

$ perl freeoa.pl z3.txt
5
3

FileHandle的文档指出,getpos返回的值是一个不透明的值,这意味着通常不能假设该值有任何意义。它唯一的好处是传递回setpos。这与用于实现方法的底层系统调用(fgetpos和fsetpos)相匹配,后者在C中表示为不透明的fpos_t指针。如果可用,seek和tell方法使用可以操作以整数表示的文件位置,更多可参考:IO::Seekable


参考:
http://www.davidpashley.com/articles/perl-io-objects.html