Perl文件操作处理及示例
2015-05-14 16:30:10 阿炯

This article describes the facilities provided for Perl file handling.(本文描述Perl提供文件处理方法。)

Opening files

Opening a file in perl in straightforward(直接使用open函数来打开文件):
open FILE, "filename.txt" or die $!;

The command above will associate the FILE filehandle with the file filename.txt. You can use the filehandle to read from the file. If the file doesn't exist - or you cannot read it for any other reason - then the script will die with the appropriate error message stored in the $! variable.
上面的指令将把文件filename.txt关联到文件句柄,您可以使用文件句柄从中读取内容。如果文件不存在,或者你不能读或其他原因,那么脚本将调用die方法,与其相应的错误消息存储在$!变量里。

What if you wanted to modify the file instead of just reading from it? Then you'd have to specify the appropriate mode using the three-argument form of open.
如果你想修改文件而不只是看一下,这就需要你必须指定适当的模式来打开了,使用three-argument形式。

始终应以三参数的形式调用open

从Perl 5.6开始,已有了解决这种问题的办法,就是以三个参数形式open。而且从现在开始,就应该养成使用这种形式的习惯。

open my ($fh) , '<', $write_file  or die ....;
open my ($fh),'>>',$append_file or die ... ;
print "$file1 and $file2 differ"  if $diff;

open FILEHANDLE, MODE, EXPR

The available modes are the following:
mode     operand     create     truncate
read     <         x    x
write     >     ✓     ✓
append     >>     ✓     x

Each of the above modes can also be prefixed with the + character to allow for simultaneous reading and writing.
上述每个模式也可以前缀+字符,这样将允许同时读和写。

mode     operand     create     truncate
read/write     +<         x    x
read/write     +>     ✓     ✓
read/append     +>>     ✓    x

下面的列表中的模式,可用于文件处理。

MODE     DESC
<     READ
>     WRITE, CREATE, TRUNCATE
>>     WRITE, APPEND, CREATE
+<     READ, WRITE
+>     READ, WRITE, TRUNCATE, CREATE
+>>     READ, WRITE, CREATE, APPEND

MODE    Definition
< or r    Read Only Access
> or w    Creates, Writes, and Truncates
>> or a    Writes, Appends, and Creates
+< or r+    Reads and Writes
+> or w+    Reads, Writes, Creates, and Truncates
+>> or a+    Reads, Writes, Appends, and Creates

下表列出可以用于sysopen()函数使用的标志位,意义与上同。

VALUE     DESC
O_RDWR     Read and Write
O_RDONLY     Read Only
O_WRONLY     Write Only
O_CREAT     Create the file
O_APPEND     Append the file
O_TRUNC     Truncate the file
O_EXCL     Stops if file already exists
O_NONBLOCK     Non-Blocking usability

Notice, how both +< and +> open the file in read/write mode but the latter also creates the file if it doesn't exist or truncates (deletes) an existing file. So, if you wanted to open a file for writing, creating it if it doesn't exist and truncating it first if does, you'd do the following:
注意,如何+< +>以读/写模式打开文件,但后者会创建文件,如果不存在或者将现有文件(删除)。所以,如果你想要打开一个文件进行写操作,如果它不存在则创建它,存在就清空它,参考如下操作:

open FILE, ">", "filename.txt" or die $!

This operation might fail if for example you don't have the appropriate permissions. In this case $! will be set appropriately.
如果例如你没有适当的权限这个操作可能会失败,在这种情况下,$!将适当设置为出错信息。

The mode and the filename in the three-argument form can be combined, so the above can also be written as:
以three-argument的调用方法,模式和文件名可以组合形式出现,所以上面也可以写成:
open FILE, ">filename.txt" or die $!;

As you might have guessed already if you just want read access you can skip the mode just as we did in the very first example above.
您可能已经猜到了,我只是想要读操作,您可以跳过此模式就像我们在上面的第一个例子那样。

Reading files

If you want to read a text file line-by-line then you can do it as such:
如果你想逐行读取文本文件,那么你可以这样做:
my @lines = <FILE>;

The <FILE> operator - where FILE is a previously opened filehandle - returns all the unread lines of the text file in list context or a single line in scalar context. Hence, if you had a particularly large file and you wanted to conserve memory you could process it line by line:
<FILE>文件操作符是一个先前打开的文件句柄,返回所有的未读的文本行在列表上下文或在标量上下文情况下返回一行。因此,如果有一个特别大的文件又想节约内存的话可以逐行处理:
while (<FILE>) { print $_; }

The $_ variable is automatically set for you to the contents of the current line. If you wish you may name your line variable instead:
$ _变量自动设置为当前行中的内容,如果希望是其它名字变量:
while (my $line = <FILE>)

will set the $line variable to the contents of the current line. The newline character at the end of the line is not removed automatically. If you wish to remove it you can use the chomp command. After all lines have been read the <FILE> operator will return a false value hence causing the loop to terminate.
将$line变量设置为当前行中的内容,换行符结束的时候是不会自动删除。如果想删除它,可以使用chomp指令。这样会一直读<FILE>操作符中的内容并在文件结尾时返回假值时从而令循环终止。

There may cases where you need to read a file only a few characters at a time instead of line-by-line. This may be the case for binary data. To do just that you can use the read command.
在很多情况下,可能需要每次仅读取一个文件几个字符,而不是逐行读取,多见于二进制格式的文件。你可以这样使用读指令。
open FILE, "picture.jpg" or die $!;
binmode FILE;
my ($buf, $data, $n);
while (($n = read FILE, $data, 4) != 0) { print "$n bytes read\n"; $buf .= $data; }
close(FILE);

There is a lot going on here so let's take it step by step. In the first line of the above code fragment a file is opened. As you can guess from the filename it is a binary file. Binary files need to treated differently than text files on some operating systems (eg, Windows). The reason is that on these platforms a newline "character" is actually represented within text files by the two character sequence \cM\cJ (that's control-M, control-J). When reading the text file Perl will convert the \cM\cJ sequence into a single \n newline characted. The converse also holds when writing files. Clearly, when reading binary data this behavior is undesired and calling binmode on the filehandle will make sure that this conversion is avoided.

有很多事情让我们一步一步地做,在上面的代码片段第一行将文件打开。你可以从文件名估测它是否为二进制文件,它在某些操作系统(例如,Windows)上需要与文本文件区别对待。原因是在这些平台上一个换行符“字符”实际上是代表在两个文本文件的字符序列\cM\cJ(control-M,control-J),当在读该文本文件时,Perl将\cM\cJ序列转换成单个\ n换行,同时还在写文件。显然,以此种方式这种读取二进制文件数据是不被鼓励的,但调用binmode文件句柄来确保这种转换,是可以避免上述问题的。

my $filename = 'freeoa.txt';
if (open(my $fh, '<:encoding(UTF-8)', $filename)) {
while (my $row = <$fh>) {
chomp $row;
print "$row\n";
}
close($fh);
} else {
warn "Could not open file '$filename' $!";
}

The read command takes either 3 or 4 arguments. The 3-argument form is:
读命令需要3或4参数,3参数形式是:
read FILEHANDLE, SCALAR, LENGTH

while the 4-argument form is:
而4参数的形式是这样的:
read FILEHANDLE, SCALAR, LENGTH, OFFSET

In the first case LENGTH characters of data are read in the variable specified by SCALAR from FILEHANDLE. The return value of read is the number of characters actually read, 0 at the end of the file or undef in the case of an error. Returning to our example above the third line of code will read at most 4 characters of data into the $data variable. The number of characters read will be stored in $n. Successive read operations on the same filehandle will set the current file position to be just before the first unread character. Thus the code above will read the contents of the file picture.jpg and store them in $buf, printing the number of characters read at every iteration.

在第一种情况下使用标量变量的文件句柄按指定的长度读取字符数据。返回值是该字符的实际数量,为0时或在文件的末尾或返回undef将会引出一个错误。回到上面的例子中,第三行代码将最多4个字符的数据读入$data变量中,读取的字符数将存储在$n。在同一文件句柄中连续读取操作将设置当前文件的位置就在第一个未读的字符之前。因此,上面的代码将读取文件picture.jpg的内容并将它们存储在$buf并打印在每次迭代中读取的字符数。

If OFFSET is specified then the characters read will be placed at that position within the SCALAR. Taking advantage of this we could rewrite the loop above as such:
如果指定偏移量然后读取的字符将被放置在这个位置在标量,利用这个我们可以将上面的循环改写为这样:

my ($data, $n, $offset);
while (($n = read FILE, $data, 4, $offset) != 0) {
 print "$n bytes read\n"; $offset += $n;
}

Even though the example above demonstrates binary reading the read command works just as well on text files - just make sure to use (for binary) or not use (for text) binmode accordingly.
尽管上面的例子演示了二进制读文本文件的读取指令同样可用于纯文本文件,一定要在二进制文件上使用binmode或在文本不必使用它。

Writing files

Now that you know how to open and read files learning how to write to them is straighforward. Take a look at the following code:
现在你知道如何打开和读取文件,学习如何进行写操作也是如法炮制,看看下面的代码:
open FILE, ">file.txt" or die $!;
print FILE $str;
close FILE;

Not much is new here. The only thing to observe is the two-argument use of print, the first argument being the FILEHANDLE to write to and the second an expression to be written. The expression can be anything: a scalar, a list, a hash, etc. Appending to a file can be accomplished in exactly the same manner - apart from specifying the appropriate (>>) mode of course.

说到这里,唯一的要注意的事就是print的双参数的使用:第一个参数被写入的文件句柄和第二个要写入表达式内容。该表达式可以是任何东西:一个标量,一个列表,一个散列等,追加到一个文件可以以完全相同的方式来完成——除了指定其它的(> >)的模式。

Note that write is not the opposite of read. Unfortunately. (很遗憾的是,write并不是read的反向使用。)

Instead write is used to write formatted records to file, a subject outside the scope of this article.
使用带格式化的写入文件,已经超出本文的内容了,具体可以参考:perl使用write函数来格式化(Format)报表输出

Closing files

Once you are done reading and writing you should close any open filehandles. (一旦你在完成读或写文件后,应该将所打开的文件句柄关闭。)
open FILE1, "file.txt" or die $!;
open FILE2, "picture.jpg" or die $!;
...
close FILE2;
close FILE1;

If you forget to close a filehandle Perl will do it for you before your script exists but it is good practice to close yourself what you have opened.
如果你忘记关闭一个文件句柄,Perl脚本会在退出之前来助其关闭,但最好的做法主动将其关闭。

The close command may also fail returning false, eg, if you try to close a closed filehandle. If you want to catch these errors you can check the return value of close and the approriate error message stored in $! as is done in the following example:

关闭命令也可能失败返回false,例如如果你试图关闭已经关闭了的文件句柄。如果想捕获这些错误,你可以得到关闭的错误消息($!)和返回值,参考下面的例子:
close FILE or die $!

Summary of perl file handling(关于perl文件处理的总结)

The open, close, print and read commands will allow you to perform most common file operations. However, much more is possible. Apart from opening files you may open pipes to other commands using the | mode and read from them or write to them using the techniques described. This and more in an article to come.

打开、关闭、打印和读是最常见的文件操作,当然还有其它更多操作。除了打开文件你可以使用|来打开管道进行操作或其他命令模式读取或写入使用技术细节,将会在另一篇文章中写道。

参考来源:http://www.perlfect.com/articles/perlfile.shtml


操作文件


1、打开、关闭文件

open的返回值用来确定打开文件的操作是否成功,当其成功时返回非零值,失败时返回零,因此可以如下判断:
if(open(MYFILE, "myfile")){
    # here is what to do if the file opened successfully
}

当文件操作完毕后,用close(MYFILE)关闭文件。

读:open(文件句柄,"<文件名")、open(文件句柄,"文件名")   前提文件必须已经存在,否则会返回0,出错信息在$!中。
写:open(文件句柄,">文件名"),文件如果不存在,那么创建之,如果存在,内容被清空,长度截为0,$!中有出错信息。
追加:open(文件句柄,">>文件名"),基本同写,但有一点,文件中的内容不会被清空,新的内容会追加到原文后面。
读写:open(文件句柄,"+<文件名"),通过"+<"模式,你可以既可以读文件,又可以写文件。你可以通过tell()函数在文件内部移动,通过seek()函数进行定位。如果文件不存在,就会被创建。如果文件已经存在,原来的数据不会被清除。

2、操作文件

语句:$line = <MYFILE>;从文件中读取一行数据存储到简单变量line中并把文件指针向后移动一行。
语句:@array = <MYFILE>;把文件的全部内容读入数组@array,文件的每一行(含回车换行符)为@array的一个元素。

主要使用以下三个操作函数:

tell

get current seekpointer on a filehandle
获取之前定位的指针。函数tell用来确定文件中当前位置(亦即第一条记录后的那个字节),tell返回的值也是记录长度的字节数。
length = tell(FILE);

如一条记录有3个字节长度,字节0,1和2。因而,文件中的当前位置是3,这就是一条记录的长度。


tell - http://perldoc.perl.org/functions/tell.html

seek

seek 设置文件的当前位置,当一个文件非常大时可以从指定位置读起。如果记录的长度已知,函数seek可用于查找文件中的任何记录。

seek FILEHANDLE,POSITION,WHENCE

成功返回真,失败返回假。

函数seek接受3个参数:FILEHANDLE、POSITION和WHENCE。

WHENCE指出要开始查找的位置:
0表示从文件开始处计算;
1表示从当前位置开始计算;
2表示从文件结束出开始计算(这种情况下,offset一般为负数)。

POSITION是读入的新位置(字节)。WHENCE有3个值:0表示新位置是POSITION,1表示当前位置加上POSITION,2表示文件尾加上POSITION。POSITION是在文件中确定新的当前位置时,所需移动的字节数。这个数一般可以这样计算:所需跳过的记录数乘以记录的字节数。

配合tell函数能针对常规文件返回其文件句柄的当前位置,该位置可作为调用seek函数的参数,以便移动到文件内的某个位置上。

例如:从file.txt的12字节开始读起并打印出来。
open (FILEHANDLE,"<file.txt") or die "can not open file.txt";
seek FILEHANDLE,12,0;
while(<FILEHANDLE>){
    print;
}
close (FILEHANDLE);

seek - reposition file pointer for random-access I/O

定位指针,即记录文件句柄的地址。可以记录文件句柄的头地址、中间地址、末地址。

例如:seek(FILE,  5*length, 0)

执行后,会使用计算式5*length和method为0来跳过文件的前5条记录,把第6条记录作为文件的当前位置(如果每条全记录均为length长)。

seek - http://perldoc.perl.org/functions/seek.html


seek和tell一般是配套使用。下面的例子,包含了seek和tell的用法,从另一角度分析了对文件句柄头地址的理解。

tell的作用,是记录文件句柄的头地址,即位置。seek有三个参数:
1.文件句柄;
2.依据第三个参数,提供相对于头地址的相对位置;
3.如下:
0):代表文件开头的位置。即重新开始,类似先close $fileh,然后再open $fileh的效果。
1):代表当前头地址的位置。
2):代表文件末尾的位置。

且看如下的示例:从文本文件中查找过滤包含有keyword的行的处理
t2.txt有如下内容:
1postgres
2stats
3collector
4sshd proces
5the keyword here
6the big fan
7psit
8end here?
9perl seektell.

Perl脚本内容:
use v5.20;
open(TEST,"<","t2.txt");
while(<TEST>){
    if(index($_,"keyword")>-1){
        my $pos=tell(TEST);
        my $kwd_line = $_;
        say "Ins:Pos:$pos";
        my $line_1 = <TEST>;
        my $line_2 = <TEST>;
        my $line_3 = <TEST>;
        #$pos=tell(TEST);
        #say $pos;
        say "$kwd_line $line_1 $line_2 $line_3";
        unless($line_1){say " __End Of File1__"; last;}
        unless($line_2){say " __End Of File2__"; last;}
        unless($line_3){say " __End Of File3__"; last;}
        seek(TEST,$pos,0);
    }
    my $pos=tell(TEST);
    print "Out:$pos:$_";
}
close (TEST);

执行后的输出:
Out:10:1postgres
Out:17:2stats
Out:28:3collector
Out:41:4sshd proces
Ins:Pos:59
5the keyword here
 6the big fan
 7psit
 8end here?

Out:59:5the keyword here
Out:72:6the big fan
Out:79:7psit
Out:89:8end here
Out:105:9perl seektell.


truncate

truncate函数接收两个参数:一个文件句柄和一个文件字节位置,它从文件字节处开始移除记录,直到文件结束。

use v5.20;
use Fcntl qw(:DEFAULT :flock);

my ($fn)=('trun.ex3.txt');
my $fs=(stat($fn))[7];
say "File:$fn Orig-Size:$fs";

flock($fn,LOCK_EX);
my $rx=truncate $fn,100;
flock($fn,LOCK_UN);

say Dumper($rx);

END{
    $fs=(stat($fn))[7];
    say "File:$fn New-Size:$fs";
}

-------------------------------

Perl File Handle Open, Read, and Write File Examples(Perl 使用文件句柄来读写文件的示例)

1. Typical Way of Opening a Perl File Handlers(Perl打开一个文件句柄处理的典型方式)


The perl example below opens a file with a bareword. This is a typical perl file open scenario.(在下面的示例perl bareword打开一个文件。这是一个典型的perl文件打开的场景。)
open FH,"</tmp/msg";

Read Operation with Bareword file handle:(使用Bareword文件句柄读操作)
open FH,"</tmp/msg";
$line  = <FH>;
print $line;

Write Operation with the Bareword file handle:(使用Bareword文件句柄写操作)
open FH,">/tmp/msg";
print FH "Perl - Practical Extraction Report Language\n";

If you want to pass this handler to a perl function, you would use typeglob as shown below.(如果你想要将这个文件句柄传递给函数处理,可使用typeglob如下所示。)
open FH,"</tmp/msg";
read_text(*FH);
sub read_text{
 local *FH = shift;
 my @lines;
 @lines = <FH>;
 print @lines;
}

2. Opening a Perl File Handle reference in Normal Scalar Variable(引用在正常标量变量来打开一个Perl文件句柄)

You can use a scalar variables to store the file handle reference as shown below.(你可以用一个标量变量来存储文件句柄引用,如下所示。)

# $log_fh declared to store the file handle.
my $log_fh;
open $log_fh,"</tmp/msg";
read_text($log_fh);

sub read_text{
 local $log_fh = shift;
 my @lines;
 @lines = <$log_fh>;
 print @lines;
}

3. Use Perl IO::File to Open a File Handle(使用IO::File模块来打开文件句柄)

IO::File is a perl standard CPAN module which is used for opening a file handle in other colourful conventions.(IO::File文件是一个标准的CPAN模块,用于极为方便打开一个文件句柄供使用。)

use IO::File;
$read_fh = IO::File->new("/tmp/msg",'r');
read_text($read_fh);
sub read_text{
 local $read_fh = shift;
 my @lines;
 @lines = <$read_fh>;
 print @lines;
}

Following perl code snippet explains perl write operation with IO::File module.(下面的perl代码片段解释了用IO::文件模块来写操作。)
$write_fh = IO::File->new("/tmp/msg",'w');

To open the file handler in append mode, do the following.(以下步骤是在append模式打开文件进行处理。)
$fh = IO::File->new("/tmp/msg",O_WRONLY|O_APPEND);

4. Open Perl File Handler in Both Read and Write mode(以读写模式打开Perl文件处理。)

When you want to open both in read and write mode, Perl allows you to do it. The below perl mode symbols are used to open the file handle in respective modes.(当你想以读写模式打开文件时,Perl允许你这样做,下面的符号是perl可使用的模式打开文件句柄。)
MODE     DESCRIPTION
+<     READ,WRITE
+>     READ,WRITE,TRUNCATE,CREATE
+>>     READ,WRITE,CREATE,APPEND

File Open Codes
Entities    Definition
< or r    Read Only Access
> or w    Creates, Writes, and Truncates
>> or a    Writes, Appends, and Creates
+< or r+    Reads and Writes
+> or w+    Reads, Writes, Creates, and Truncates
+>> or a+    Reads, Writes, Appends, and Creates

Let us write an example perl program to open a sample text file in both read and write mode.(让我们写一个示例的perl程序以读写模式来打开一个示例文本文件。)
$ cat /tmp/text
one
two
three
four
five

The below code reads first line from the /tmp/text file and immediately does the write operation.(下面的代码从/ tmp /text中读取第一行并立即执行写操作。)
open(FH,"+</tmp/text");
read_line(*FH);
write_line(*FH,"222\n");

sub read_line{
 local *FH = shift;
 my $lines;
 $line = <FH>;
 print $line;
}

sub write_line{
 local *FH = shift;
 print FH @_;
}

close(FH);

The output of the above code is shown below.(上述代码的输出结果。)

$ perl ./read_and_write.pl
one

$ cat /tmp/text
one
222
three
four
five

5. Open the Standard Input and Standard Output(打开标准的I/O)

Perl allows you to open the standard input and standard output with other file handle names.(Perl允许您用其它名字来打开标准输入和标准输出文件句柄。)

Perl standard output example:(Perl标准输出示例)
open(OUT,">-");
print OUT "STDOUT opened with the name as OUT";

Perl standard input example:(Perl标准输入示例)
open(IN,"-");
print "STDIN opened with the name as IN";
$input = <IN>;

6. Use sysopen() to Open the File(使用sysopen()来打开文件)

sysopen() function requires three arguments such as file handle, filename and mode.(sysopen()函数需要三个参数,如文件句柄,文件名和模式。)

Read Operation Example:(读操作示例)
sysopen(FH,"/tmp/text",O_RDONLY);
$line = <FH>;
print $line;

Write Operation Example :(写操作示例)
sysopen(FH,"/tmp/text",O_WRONLY);
print FH "write operation";

Different types of modes are shown in the table below.(不同类型的模式如下表所示。)
MODE     DESCRIPTION
O_RDONLY     READ
O_WRONLY     WRITE
O_RDWR     READ and WRITE
O_CREAT     CREATE
O_APPEND     APPEND
O_TRUNC     TRUNCATE
O_NONBLOCK     NON BLOCK MODE

Note : You would need to have the habit of validating opened file handlers. The most common way of handling the file handler open failure with the die function is shown below.
注意:如果有打开文件处理时需要验证的习惯,最常见的处理方式是调用文件打开失败模函数,如下所示:
open(FH,">/tmp/text") or die "Could not open /tmp/text file : $!\n";

If the above code is unable to open the file “/tmp/text”, it returns failure, and die gets executed. And the “$!” Buildin variable contains the reason for open function failure.
如果上面的代码无法打开文件“/ tmp/text”,它返回失败,die函数将得到执行,和“$!“内置的变量将包含open函数失效的原因。

7. convert a string to a file handle in perl(perl中使用字符串作为文件句柄来使用)

Open a reference to a string:
use v5.12;
use autodie;

my $foo = "abc\ndef\n";
open my $fh, "<", \$foo;
while (<$fh>) {
  print "line $.: $_";
}

use OO-style then use IO::String package.

use v5.12;
use IO::String;
my $s="dfasdfasdfafd....\nabc";
my $io = IO::String->new($s);
while (my $line = $io->getline()) {
   print $line;
}
print "\nTHE END\n";
# write new line
$io->print("\nappend new line");

# back to the start
$io->seek(0, 0);

while ($io->sysread(my $line, 512)) {
   print $line;
}


在目前测试来看,只有open函数才支持对字符串作为文件句柄来使用,sysopen与IO::File模块还不支持这种方式。

Perl Cookbook ch08_24 中有对将字符串作为文件句柄来使用的说明,原文如下:

1. Problem
You have data in string, but would like to treat it as a file. For example, you have a subroutine that expects a filehandle as an argument, but you would like that subroutine to work directly on the data in your string instead. Additionally, you don't want to write the data to a temporary file.

2. Solution
Use the scalar I/O in Perl v5.8:
open($fh, "+<", \$string); # read and write contents of $string

3. Discussion
Perl's I/O layers include support for input and output from a scalar. When you read a record with <$fh>, you are reading the next line from $string. When you write a record with print, you change $string. You can pass $fh to a function that expects a filehandle, and that subroutine need never know that it's really working with data in a string.

Perl respects the various access modes in open for strings, so you can specify that the strings be opened as read-only, with truncation, in append mode, and so on:
open($fh, "<", \$string); # read only
open($fh, ">", \$string); # write only, discard original contents
open($fh, "+>", \$string); # read and write, discard original contents
open($fh, "+<", \$string); # read and write, preserve original contents

These handles behave in all respects like regular filehandles, so all I/O functions work, such as seek, truncate, sysread, and friends.

Windows 10 CMD 窗口下,默认编码为CP936(GBK)
###gbk-cmd-terminal-ok
use v5.20;
use Encode;
#enc-cn same as gbk
binmode(STDOUT, ':encoding(gbk)');
my ($text,$cnt) = ('炯' x 100,0);  #encoded
if(open my $fh, '<:encoding(UTF-8)', \$text){
    while (read $fh, my $chr, 1){
        my $enc = $chr; #decoded
        #utf8::encode($enc) if utf8::is_utf8($enc);
        encode("gbk",$enc) if utf8::is_utf8($enc); #Same as gbk
        print $enc,'(',$cnt++,');';
    }
}

###cp936-cmd-terminal-ok
use v5.20;
use Encode;
binmode(STDOUT, ':encoding(euc-cn)');
my ($text,$cnt) = ('Perl象C一样强大,象Awk、Sed等脚本描述语言一样方便。是一种折衷但流行的脚本语言,它借用了C语言与Shell脚本语言和许多其他地方的语法和命令。广泛的命令和功能以及添加扩展的能力使其非常适合快速原型设计、系统实用程序、软件工具、系统管理任务、数据库访问、网络和Web编程等任务。' ,0);
if(open my $fh, '<:encoding(UTF-8)', \$text){
    while (my $chr=getc($fh)){
        my $enc = $chr;    #decoded
        encode("euc-cn",$enc) if utf8::is_utf8($enc);
        print $enc,'-',$cnt++,';';
    }
}


###sysread在读取时会因为utf8的编码属性导致问题从而出现报错
#sysread() isn't allowed on :utf8 handles at ...
use v5.20;
use Encode;
use Fcntl qw(:DEFAULT :flock);
#use open qw/:std :utf8/;
binmode(STDOUT, ':encoding(euc-cn)');
...


在sysread函数中对此有说明:
It using the read(2). It bypasses buffered IO, so mixing this with other kinds of reads, print, write, seek, tell, or eof can cause confusion because the perlio or stdio layers usually buffers data.

Note that if the filehandle has been marked as :utf8 Unicode characters are read instead of bytes (the LENGTH, OFFSET, and the return value of sysread() are in Unicode characters).

read(2)

在syswrite函数中对此有说明:
It using write(2). It bypasses buffered IO, so mixing this with reads (other than sysread()), print, write, seek, tell, or eof may cause confusion because the perlio and stdio layers usually buffer data. Returns the number of bytes actually written, or undef if there was an error (in this case the errno variable $! is also set).

WARNING: If the filehandle is marked :utf8 , Unicode characters encoded in UTF-8 are written instead of bytes, and the LENGTH, OFFSET, and return value of syswrite() are in (UTF8-encoded Unicode) characters. The :encoding(...) layer implicitly introduces the :utf8 layer. Alternately, if the handle is not marked with an encoding but you attempt to write characters with code points over 255, raises an exception. See binmode, open, and the open pragma, open.

write(2)

在sysseek函数中对此有说明:
Sets FILEHANDLE's system position in bytes using lseek(2). FILEHANDLE may be an expression whose value gives the name of the filehandle.

Note the in bytes: even if the filehandle has been set to operate on characters (for example by using the :encoding(utf8) I/O layer), tell() will return byte offsets, not character offsets (because implementing that would render sysseek() unacceptably slow).

sysseek() bypasses normal buffered IO, so mixing it with reads other than sysread (for example <> or read()) print, write, seek, tell, or eof may cause confusion.

lseek(2)

perlopentut中有述:如果想要有shell的便利性,那么perl的open函数无疑是很好的选择。另外如果想要比肩如C语言中的fopen(3)那般的高精度而非过于简单的操作,那应该关注sysopen函数了,它是直接使用了open(2)系统调用的函数;这确实意味着它需要更多的介入,但有操控精确性的代价。

fopen(3):https://man7.org/linux/man-pages/man3/fopen.3.html

sysopen is a thin wrapper around the open(2) kernel system call (the arguments correspond directly), whereas open is a higher-level wrapper which enables you to do redirections, piping, etc.

由于Perl是用C编写的,所以这两种方法最终都可能是通过open(2)系统调用来实现的。不同的是,Perl中的open()内置了一些细节,使之在打开属性、管道和重定向变得非常容易;但open()牺牲了一些灵活性。它没有sysopen()中提供的Fcntl功能,也没有权限掩码位的功能,但都是些小问题。

大多数情况下还是使用open()函数。

sysopen就不支持在使用时用encoding来指定编码,当用带编码的参数调用open来取代sysopen函数时,sysread就会报如上文提及的问题:
sysread() isn't allowed on :utf8 handles at ...

在使用getc取单字符时,只有显式将句柄声明为utf8编码才能按文件内容来按字符取值,文件指针的偏移量按字符实际的分配的大小的进行(中文在UTF8编码的文件基本上按3个字节/每字来分配)。

在2015年的时候就有人问起这个问题了:UTF-8 and systemIO are not friends anymore


--------------------------------------------------------------

其它常见的文件操作技巧

文件句柄的几点知识


6个特殊文件句柄是perl保留的,它们是:STDIN,STDOUT,STDERR,DATA,ARGV,ARGVOUT.

指定编码方式读取文件
open CONFG, '<:encoding(UTF-8)','dino';
encoding(UTF-8) 和只写:utf8的区别,简写方式不会考虑输入或输出数据是否真的是合法的utf-8字符串。 使用encoding()的形式,还能指定其他类型的编码。我们可以通过下面的命令打印出所有Perl能理解和处理的字符编码清单:
perl -MEncode -le "print for Encode->encodings(':all')"

如果想要保存得到的文件每行都以CR-LF结尾,就得在该文件时使用特殊层:
open BEDROCK,'>:crlf', $file_name;

不过注意,如果原本就是CR-LF风格的话,转换后会多出一个换行符。

读取DOS风格的文件时也可以这样转换
open BEDROCK,'<:crlf', $file_name

读取文件的时候,Perl会把所有CR-LF都转换为Unix风格的换行符。

自动检测致命错误
从Perl5.10开始,为人称道的autodie编译指令已经成为标准库的一部分。
use autodie;

将数据输出到文件句柄时,默认情况下都会经过缓冲处理。不过将特殊变量$|设定为1,就会使当前的默认文件句柄在每次进行输出操作后立即刷新缓冲区。

select LOG;
$|=1 ; #不要将LOG的内容保留在缓冲区
select STDOUT;
print LOG "This gets written to the LOG at once!\n";

标量变量中的文件句柄
从perl 5.6 开始,我们已经可以把文件句柄放到标量变量中,而不必非得使用裸字。别看这点差别,带来的好处可不少。 成为标量变量之后,文件句柄就可以作为子程序的参数传递,或者放在数组,哈希中排序。不过很多时候我们写的都是应急的短小脚本,用裸字更快捷,没必要使用变量存储文件句柄。习惯变量名后面添上_sh表示这是用来保存文件句柄的变量:
open my $rock_fh , '<' , 'freeoa.txt';

Read file as var(读入文件存入到标量变量中)
sub slurp{
 my $file = shift;
 local *F;
 open F, "< $file" or die "Error opening '$file' for read: $!";
 if(not wantarray){
  local $/ = undef;
  my $string = <F>;
  close F;
  return $string;
 }
 local $/ = "";
 my @a = <F>;
 close F;
 return @a;
}

Read the contents into an array(读入文件存入到数组变量中)
Each row will be stored in an array element(每一行作为一个数组元素)

open FILE, "<file.txt";
@lines = <FILE>;

Read the contents into a scalar(将文件内容载入到标量中)

The whole file is stored in a single scalar variable. To do this, the special variable $/ should have an undefined value when reading the file.
整个文件都存储在一个标量变量,要做到这一点需借助特殊变量$/,读取文件时应该将其设为一个未定义的值。

Here's one way to do it:
open FILE, "<file.txt";
$file_contents = do { local $/; <FILE> };

Just reading from a file
open my $io, "<-" or die "NO STDIN: $!" ;
while (<$io>) { ... }
close $io;

Subroutine-based

Subroutine to open a file for reading and read it.(来打开一个文件用于读入的函数。)
sub read_file{
 my ($f ) = @_;
 open F, "< $f" or die "Can't open $f : $!";
 my @f = <F>;
 close F;
 return wantarray ? @f : \@f;
}

Subroutine to open a file for writing and write into it.(来打开一个文件用于写入的函数。)
sub write_file{
 my ($f, @data ) = @_;
 @data = () unless @data;
 open F, "> $f" or die "Can't open $f : $!";
 print F @data;
 close F;
}

Directory-based operations(目录的操作)

Load all files in a directory into the @files array(将所有的目录中的文件载入到@files数组中)

my $dirToRead = "/usr/bin";
opendir(DIR, $dirToRead) or die "can't open dir $dirToRead: $!";
my @files = readdir(DIR);
closedir(DIR);
 
# build a unsorted list from the @files array:
foreach $file (@files) {
 next if ($file eq "." or $file eq "..");
 print $file ;
}

又一示例

my $dirToRead = "/usr/bin" ;

opendir(MYDIR, $dirToRead) or die "can't open dir $dirToRead: $!" ;
#my @dirContents = grep(!/^\.\.?$/, readdir(MYDIR)) ;
my @dirContents = grep {!/^\.\.?$/ && /p$/} readdir(MYDIR) ;

print "Files in $dirToRead:\n";
foreach my $file (@dirContents){
 next if $file =~ /^\.\.?$/ ;  # another way to skip . and ..
 print "$file \n" if -T "$dirToRead/$file" ; # if the file is a {T}ext file.
}
closedir(MYDIR) ;

使用子程序的方法

use DirHandle ;
sub plainFiles {
 my $dir = shift;
 my $dh = DirHandle->new($dir)  or die "can't opendir $dir: $!" ;
 return sort # sort pathnames
  grep {   -f      }  # chose only "plain" files
  map  { "$dir/$_" } # create full paths
  grep {  !/^\./   } # filter out dot files
  $dh->read(); # read all entries
}


Fcntl module

这个模块是从C语言中的fcntl借鉴过来的并引入到Perl中,为的是提高处理效率,有强烈的C语言的背景。

use Fcntl; # Import standard fcntl.h constants.
use Fcntl ":flock"; # Import LOCK_* constants.
use Fcntl ":seek";# Import SEEK_CUR, SEEK_SET, SEEK_END.
use Fcntl ":mode"; # Import S_* stat checking constants.
use Fcntl ":Fcompat"; # Import F* constants.

O_ Flags includes in Fcntl module

Value    Definition
O_RDWR    Read and Write
O_RDONLY    Read Only
O_WRONLY    Write Only
O_CREAT    Create the file
O_APPEND    Append the file
O_TRUNC    Truncate the file
O_EXCL    Stops if file already exists
O_NONBLOCK    Non-Blocking usability


-------------------------------
下面为摘录的一些操作经验

-------------------------------
经验谈:
perl中的文件读取函数:read,sysread等对offset的支持都是没有实现的,只能借助于seek类函数,如果阁下能测试成功,请设法告诉一下,让笔者也观摩学习。

perl的truncate函数不能作用于文件句柄,可以作用于文件,即使该文件处于编辑或脚本内为打开状态。它保留了文本文件前面的内容,之后的被清理掉了。

-------------------------------
perl file read

Where are your fseek(), fwrite() and ftruncate() functions defined? Perl doesn't have those functions. You should be using seek(), print() (or syswrite()) and truncate(). We can't really help you if you're using functions that we know nothing about. You also don't need (and probably don't want) that explicit call to unlock the file or the call to close the file. The filehandle will be closed and unlocked as soon as your $file variable goes out of scope.

# Rewind from the end of the file until count eol's
seek FILE,0, 2; #go to EOF
seek FILE,-2048,2; #get last 2k bytes

# example for files with max line lengths < 400, but it's adjustable usage tailz filename numberoflines
use v5.20;
die "Usage: $0 file numlines\n" unless @ARGV == 2;
my ($filename, $numlines) = @ARGV;
my $chunk = 400 * $numlines; #assume a <= 400 char line(generous)

# Open the file in read mode
open FILE, "<$filename" or die "Couldn't open $filename: $!";
my $filesize = -s FILE;
if($chunk >= $filesize){$chunk = $filesize}

seek FILE,-$chunk,2; #get last chunk of bytes
my @tail = <FILE>;
if($numlines >= $#tail +1){$numlines = $#tail +1}
splice @tail, 0, @tail - $numlines;

print "@tail\n";
exit;

取得文件的大小(字节),最修改时间,创建时间
my ($filesize,$modified,$created) = (stat(_))[7,9,10];

将文件句柄指针定位到文件末尾
seek $f, 0, 2;  # seek to the end of the file

perldoc -f read:
read FILEHANDLE,SCALAR,LENGTH,OFFSET
read FILEHANDLE,SCALAR,LENGTH

read( $fh, $top, $offset);

your $offset is actually a length. Decide how many characters you need to read. read does not respect line-endings, it reads the number of bytes specified.
$offset实际上是一个长度,取决于需要读多少个字符,与行数没有关系,它读取指定的字节数相关。

If you want to read a line, then don't use read, use:
seek($fh, $offset, 0);
$top = <$fh>;

-------------------------------
perl split delimiter from file line by line

Your loop while(<FILE>){ ... } reads a single line at a time from the file handle and puts it into $_.

my @record = split(/\|/, $_) splits that line on pipe characters |, so since the first line is "a|30|40\n", @record will now be 'a', '30', "40\n". The newline read from the file remains, and you should use chomp to remove it if you don't want it there.

my @data = map { chomp; [ split /\|/ ] } <$fh>;

trying to create a 2d array, whereby each element contains all the pipe delimited items from each line of your input:
my @record;
while(<DATA>){
    chomp;
    my @split = split(/\|/);
    push @record, [@split];
}
print "@{$record[0]}\n";

-------------------------------
How to read or parse binary file

The good news though is parsing binary data with Perl is easy using the unpack function.Open normally, then call binmode:
open my $fh, '<', $filename or die;
binmode $fh;

Or set the :raw layer during the open call.
open my $fh, '<:raw', $filename or die;

binmode方法已经存在的相当长的时间了,但依然有效。

On Windows systems these both change the filehandle to be in binary mode. On Unix, Linux, and OSX the binmode call or the :raw layer have no effect as those are the default anyway.

1). Open a binary filehandle

Start things off right by opening a filehandle to binary file:
use autodie;
open my $fh, '<:raw', '/usr/share/zoneinfo/America/New_York';

This is a suitably Modern Perlish beginning. I start by importing autodie which ensures the code will die if any function call fails. This avoids repetitive ... or die "IO failed" type coding constructs.

Next I use the :raw IO layer to open a filehandle to a binary file. This will avoid newline translation issues. No need for binmode here. The file I'm opening is a history of New York timezone changes, from the tz database.

2). Read a few bytes

All binary files have a specific format that they follow. In the case of the zoneinfo files, the first 44 bytes/octets are the header, so I'll grab that:
use autodie;
open my $fh, '<:raw', '/usr/share/zoneinfo/America/New_York';

my $bytes_read = read $fh, my $bytes, 44;
die 'Got $bytes_read but expected 44' unless $bytes_read == 44;

Here I use read to read in 44 bytes of data into the variable $bytes. The read function returns the number of bytes read; it's good practice to check this as read may not return the expected number of bytes if it reaches the end of the file. In this case, if the file ends before the header does, we know we've got bad data and bail out.

3). Unpack bytes into variables

Now comes the fun part. I've got to split out the data in $bytes into separate Perl variables. The tzfile man page defines the header format:
Timezone information files begin with the magic characters "TZif" to identify them as timezone information files, followed by a character identifying the version of the file's format (as of 2005, either an ASCII NUL ('\0') or a '2') followed by fifteen bytes containing zeros reserved for future use, followed by six four-byte values of type long

Tzfile manual

The unpack function takes a template of the binary data to read (this is defined in the pack documentation) and returns Perl variables. I'm going to match up the header description with the template codes to design the template.
Description     Example     Type     Length     Template Code
Magic chars     TZif     String     4     a4
Version     2     String     1     a
Reserved     0     Ignore     15     x15
Numbers     244     Long     1     N N N N N N

The header begins with the magic chars "TZif", this is 4 bytes. The template code a4 matches this. Next is the version, this is a single ASCII character matched by a (the strings are not space or null terminated, I could have use A instead). The next 15 bytes are reserved and can be ignored, so I use x15 to skip over them. Finally there are 6 numbers of type long. Each one is separate variable so I must write N 6 times instead of N6.

use autodie;
open my $fh, '<:raw', '/usr/share/zoneinfo/America/New_York';

my $bytes_read = read $fh, my $bytes, 44;
die 'Got $bytes_read but expected 44' unless $bytes_read == 44;

my ($magic, $version, @numbers) = unpack 'a4 a x15 N N N N N N', $bytes;

This code passes my template to unpack and it returns the variables we asked for. Now they're in Perl variables, the hard part is done. In the case of a tzfile, the header defines the length of the body of the file, so I can use these variables to calculate how much more data to read from the file.

If you're interested in how to parse the rest of a tzfile, check out the source code of my module Time::Tzfile.

Troubleshooting

Sometimes you'll unpack some binary data and get garbage. This happens when the template passed to unpack doesn't match the binary data. The first thing you can do is print the binary data to the terminal with hexdump. Here are the first 44 bytes of the New York tzfile:
$ hexdump -c -n 44 /usr/share/zoneinfo/America/New_York
0000000   T   Z   i   f   2  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0000010  \0  \0  \0  \0  \0  \0  \0 005  \0  \0  \0 005  \0  \0  \0  \0
0000020  \0  \0  \0 354  \0  \0  \0 005  \0  \0  \0 024

This gives you a chance to inspect the data byte by byte and see if it matches your template. To create a template to match binary data, take it one value at a time. Consider the type of value you're trying to match. Get the right bit length and for numbers, be sure to know if it is signed or unsigned.

The other thing to be aware of is endianness of the data. Often man pages will say a variable is in "standard" or "network" order. This means big endian. Tzfiles have several 32 bit signed integers in big endian order. There is no unpack template code which matches that type. To match it I need to use l>. The l matches signed 32 bit integers and the > is a modifier which tells Perl the value is big endian.

Between Perl's built-in template types and the modifiers, you can match any binary data.


-------------------------------

参考文档:

Perl文件目录操作函数介绍

Perl IO系列文件操作模块概览

Perl文件操作入门