Perl取得文件详细信息
2013-04-27 12:18:03 阿炯

借助于'File::stat'模块,或'stat'、'lstat'(symbolic link)函数。其实后者是前者的一个导出函数,可以取得一个文件的系统属性,与系统命令'stat'类似,不过返回还需要进一步处理才更可读。

用法:
use File::stat;
$st = stat($file) or die "No $file: $!";
@array=stat($filehandle);

Calling stat($fh) returns an array with the following information about the file handle passed in (from the perlfunc man page for stat):
  0 dev      device number of filesystem
  1 ino      inode number
  2 mode     file mode  (type and permissions)
  3 nlink    number of (hard) links to the file
  4 uid      numeric user ID of file's owner
  5 gid      numeric group ID of file's owner
  6 rdev     the device identifier (special files only)
  7 size     total size of file, in bytes
  8 atime    last access time since the epoch
  9 mtime    last modify time since the epoch
 10 ctime    inode change time (NOT creation time!) since the epoch
 11 blksize  preferred block size for file system I/O
 12 blocks   actual number of blocks allocated


The 9th element in this array will give you the last modified time since the epoch (00:00 January 1, 1970 GMT). From that you can determine the local time:
my $epoch_timestamp = (stat($fh))[9];
my $timestamp=localtime($epoch_timestamp);

当然perl也其其它的方式来实现这个功能:
my $last_mod_time = -M $file;

but that value is relative to when the program started. This is useful for things like sorting, but you probably want the first version.

使用'-C'来比较文件的新旧:
if (-C "file1.txt" > -C "file2.txt"){
  /* Update */
}

Can use Time::localtime, another built-in module. This requires some (arguably) more legible code:
use File::stat;
use Time::localtime;
my $timestamp = ctime(stat($fh)->mtime);


File::stat module default exports override the core stat() and lstat() functions, replacing them with versions that return "File::stat" objects.

This object has methods that return the similarly named structure field name from the stat(2) function; namely, dev, ino, mode, nlink, uid, gid, rdev, size, atime, mtime, ctime, blksize, and blocks.

As of version 1.02 (provided with perl 5.12) the object provides "-X" overloading, so you can call filetest operators (-f , -x , and so on) on it. It also provides a ->cando method, called like
 $st->cando( ACCESS, EFFECTIVE )

where ACCESS is one of S_IRUSR , S_IWUSR or S_IXUSR from the Fcntl module, and EFFECTIVE indicates whether to use effective (true) or real (false) ids. The method interprets the mode , uid and gid fields, and returns whether or not the current process would be allowed the specified access.

If you don't want to use the objects, you may import the ->cando method into your namespace as a regular function called stat_cando . This takes an arrayref containing the return values of stat or lstat as its first argument, and interprets it for you.

You may also import all the structure fields directly into your namespace as regular variables using the ':FIELDS' import tag. (Note that this still overrides your stat() and lstat() functions.) Access these fields as variables named with a preceding st_ in front their method names. Thus, $stat_obj->dev() corresponds to $st_dev if you import the fields.

To access this functionality without the core overrides, pass the use an empty import list, and then access function functions with their full qualified names. On the other hand, the built-ins are still available via the CORE:: pseudo-package.

To avoid the magic number 9 needed in the previous example, additionally use Time::localtime, another built-in module (also included as of Perl 5.004). This requires some (arguably) more legible code:
use File::stat;
use Time::localtime;
my $timestamp = ctime(stat($fh)->mtime);

Use the builtin stat function. Or more specifically:
my $modtime = (stat($fh))[9]

You need the stat call, and the file name:
my $last_mod_time = (stat ($file))[9];

Perl also has a different version:
my $last_mod_time = -M $file;

but that value is relative to when the program started. This is useful for things like sorting, but you probably want the first version.

stat与lstat的使用

真实文件
-rw-r--r-- 1 2795174 /tmp/131.108.t11

建一个(软)链接文件(ns108.txt)
lrwxrwxrwx 16 ns108.txt -> /tmp/131.108.t11

大小为16字节。

use v5.32;
use File::stat;
use Data::Dumper;

my $file=shift;
my $st=stat($file) or die "No $file: $!";

END{
    say Dumper($st);
}

结果:
$VAR1 = bless( [
  2049,
  19265046,
  33188,
  1,
  1000,
  1000,
  0,
  2795174,
  1637115998,
  1622087073,
  1637116040,
  4096,
  5464
], 'File::stat' );


use v5.32;
use File::stat;
use Data::Dumper;

my $file=shift;
my $st=lstat($file) or die "No $file: $!";

END{
    say Dumper($st);
}


结果:
$VAR1 = bless( [
  2064,
  572557,
  41471,
  1,
  1000,
  1000,
  0,
  16,
  1637120993,
  1637120991,
  1637120991,
  4096,
  0
], 'File::stat' );


可以看见区别了吧:
stat函数会顺着链接文件找到真实文件来取得其文件属性消息。

lstat函数只分析对应的链接文件,而非其所指向的真实文件。

第13个字段值为0,代表了文件层面上所分配的块(blocks)。

这里比较有意思的是第3个字段值,上面有介绍:file mode  (type and permissions)

那么如何将这个数字转为标准的权限位呢,官方的手册页有提及。

$file_mode = (stat($filename))[2];

要取得对应的数字权限,需要将其与'07777'做逻辑运算,33188的数字的八进制数为0100644,与07777做位与运行得到644:
printf "Permissions are %04o\n", $st->[2] & 07777;

Permissions are 0644

可选的写法还有(注意:这里的stat函数是Core中函数而非File::stat模块中的同名函数):
$mode = sprintf '%04o', (stat $file)[2] & 07777;


类似的,第9~11这3个位为unix timestamp(seconds since the epoch),需要将其转为可读的日期时间格式:
say 'Atime:'.localtime($st->[8]);
Atime:Wed Nov 17 14:58:06 2021


文件的信息位中到底包含了多少信息?

这个问题有些复杂,可能还跟文件系统的类型(本地、网络如NFS)相关,比如用chattr来设置时,lsattr能展示出来的属性列有23个:
----------------------

还有那个扩展的ACL、SELinux属性。。。

用户侧常用的stat中包含了相当多的信息了,而在Unix的开发手册页中有这些属性背后的故事:
man 2 stat:
S_IFMT     0170000   bit mask for the file type bit fields
S_IFSOCK   0140000   socket
S_IFLNK    0120000   symbolic link
S_IFREG    0100000   regular file
S_IFBLK    0060000   block device
S_IFDIR    0040000   directory
S_IFCHR    0020000   character device
S_IFIFO    0010000   FIFO
S_ISUID    0004000   set UID bit
S_ISGID    0002000   set-group-ID bit (see below)
S_ISVTX    0001000   sticky bit (see below)
S_IRWXU    00700     mask for file owner permissions
S_IRUSR    00400     owner has read permission
S_IWUSR    00200     owner has write permission
S_IXUSR    00100     owner has execute permission
S_IRWXG    00070     mask for group permissions
S_IRGRP    00040     group has read permission
S_IWGRP    00020     group has write permission
S_IXGRP    00010     group has execute permission
S_IRWXO    00007     mask for permissions for others (not in group)
S_IROTH    00004     others have read permission
S_IXOTH    00001     others have execute permission
S_IWOTH    00002     others have write permission

这里是文件stat的第3个字段:file mode  (type and permissions),其中包含了两项目的信息:文件类型与文件权限,而我们用的最多是其权限,来看看手册页中的原话:


文件权限:
# Permissions: read, write, execute, for user, group, others.

S_IRWXU S_IRUSR S_IWUSR S_IXUSR
S_IRWXG S_IRGRP S_IWGRP S_IXGRP
S_IRWXO S_IROTH S_IWOTH S_IXOTH

# Setuid/Setgid/Stickiness/SaveText.
# Note that the exact meaning of these is system-dependent.

S_ISUID S_ISGID S_ISVTX S_ISTXT

# File types.  Not all are necessarily available on your system.

S_IFREG S_IFDIR S_IFLNK S_IFBLK S_IFCHR S_IFIFO S_IFSOCK S_IFWHT S_ENFMT


S_IMODE($mode)    the part of $mode containing the permission bits and the setuid/setgid/sticky bits

S_IFMT($mode)     the part of $mode containing the file type which can be bit-anded with (for example) S_IFREG or with the following functions

# The operators -f, -d, -l, -b, -c, -p, and -S.

S_ISREG($mode) S_ISDIR($mode) S_ISLNK($mode) S_ISBLK($mode) S_ISCHR($mode) S_ISFIFO($mode) S_ISSOCK($mode)

# No direct -X operator counterpart, but for the first one the -g operator is often equivalent.  The ENFMT stands for record flocking enforcement, a platform-dependent feature.

S_ISENFMT($mode) S_ISWHT($mode)


注意:第二列以0开始的这些数字是八进制数字。

假设权限数字为:33204
printf ("%0\n",$mode);

转为八进制为:0100664

664(后3位数字)这三个八进制数值容易理解(常见的数字权限,即:rw-rw-r--),但前4位数字(即0100664中的0100)是如何来的,是上面文件属性列表中的这一行吗:
S_IFREG    0100000   regular file

再回到上面的手册页,可以看到有7个重要的字段:
S_IFMT   file type
S_ISUID  set UID bit
S_ISGID  set-group-ID bit
S_ISVTX  sticky bit
S_IRWXU  owner permissions
S_IRWXG  group permissions
S_IRWXO  other permissions

如果将模式视为其字段而不是数字(0x81B4 = 33204 = 0100664 = 0b1000000110110100),可以得到:
S_IFMT:  S_IFREG (regular file)
S_ISUID: 0 (no set UID bit)
S_ISGID: 0 (no set-group-ID bit)
S_ISVTX: 0 (no sticky bit)
S_IRWXU: S_IRUSR | S_IWUSR (user has rw)
S_IRWXG: S_IRGRP | S_IWGRP (group has rw)
S_IRWXO: S_IROTH (other has r)

与07777做位与运算(&),即& (S_IRWXU | S_IRWXG | S_IRWXO)就可得到相应的权限位了,这里充分利用了C语言中掩码(mask)运算。


这里以一个文件为例
$ chmod 754 file4stat.txt

$ ls -lh
-rwxr-xr-- ... file4stat.txt

另外方便测试其suid/sgid/sticky,为其增加一个setgid位。
$ chmod 2754 file4stat.txt

$ ls -lh
-rwxr-sr-- ... file4stat.txt

代码如下(filestat3.pl):
use v5.32;
use Fcntl ':mode';
use Data::Dumper;

$Data::Dumper::Indent=1;
$Data::Dumper::Sortkeys=1;

my $file=shift;
my $st=stat($file) or die "No $file: $!";
my $mode=(stat($file))[2];

say "orig-dec-mode:$mode";
printf "orig-oct-mode:%#o\n",$mode, "\n";
printf "orig-bin-mode:%#b\n",$mode, "\n";

my $up=($mode & S_IRWXU);
printf "Just & S_IRWXU oct:%#o\n",$up, "\n";
printf "Just & S_IRWXU By mv6 oct:%#o\n",$up>>6, "\n";
printf "Permissions are %#o\n", S_IMODE($mode), "\n";
printf "File Type By IFMT oct1:%#7o,oct2:%#o\n",S_IFMT($mode),($mode & S_IFMT);
printf "File Type By MODE oct2:%#o,bin2:%#b\n",($mode & 07777),($mode & 07777);
say 'Reg File.' if(S_ISREG($mode));
printf "File Sugt by oct3:%#o,bin3:%#b\n",($mode & 07777)>>9,($mode & 07777)>>9;
printf "Sugt:%#o\n",($mode & S_ISGID);
say 'Has Sgid.' if($mode & S_ISGID);

say('#' x 16);


$ perl filestat3.pl file4stat.txt
orig-dec-mode:34284
orig-oct-mode:0102754
orig-bin-mode:0b1000010111101100
Just & S_IRWXU oct:0700
Just & S_IRWXU By mv6 oct:07
Permissions are 02754
File Type By IFMT oct1:0100000,oct2:0100000
File Type By MODE oct2:02754,bin2:0b10111101100
Reg File.
File Sugt by oct3:02,bin3:0b10
Sugt:02000
Has Sgid.
################


可见上面的权限位能与上面的设置相吻合,另外要注意:位移运算都是仅针对二进制,如果运算数字非二进制,会自动隐式转换。

以上面的例子来展开讲:
0102754的权限位为2754,754这个很熟悉了吧(rwxr-xr--),2为sgid的权限位(-s-),我们来看看是如何算出它是S_IFREG(0100000)的。

取出它的所有二进制位
0b1000_0101_1110_1100

与07777做与运算(不足的位数前面补0,这里为了方便观察,每4个位用'_'分隔开)
0b0000_1111_1111_1111

结果为权限位(02754:0b0101_1110_1100):
0b0000_0101_1110_1100

当移除所有的权限位(2754:0b0101_1110_1100),与上面一样,前面用0补齐:
0b1000_0000_0000_0000

对应的八进制为(十进制为32768):
0100000

对应的文件类型为普通文件。至于文件类型我们再看看/dev或/run或/tmp目录下,这下面有各种类型的文件,可以找一个验证一下。


# ls -l /tmp/
srw-rw-rw- 1 root root  ... ngh80.sock

$ perl filestat3.pl /tmp/ngh80.sock
orig-dec-mode:49590
orig-oct-mode:0140666
orig-bin-mode:0b1100000110110110
Just & S_IRWXU oct:0600
Just & S_IRWXU By mv6 oct:06
Permissions are 0666
File Type By IFMT oct1:0140000,oct2:0140000
File Type By MODE oct2:0666,bin2:0b110110110
File Sugt by oct3:0,bin3:0
Sugt:0

其二进制位为:
0b1100000110110110
减去其权限位后为:
0b1100000000000000

转为八进制为:perl -e 'print sprintf("%#o\n",0b1100000000000000)'
0140000

可以反查上面的表格,知道其为socket类型的文件。


参考来源

关于inode
Linux下权限位使用参考

perl-fun-stat
linux-dev-stat
linux-dev-inode