Perl的代码块执行期
2023-01-23 15:59:54 阿炯

它们也可以视为特殊代码块:BEGIN、CHECK、INIT、END和UNITCHECK。这5个特殊的代码块,要理解这几个块,关键在于几个时间点:
(1).程序编译期间
(2).程序执行期间
(3).程序执行结束但还未退出期间

这5类代码块按以下顺序运行:
BEGIN:一旦解析就运行,即只要在编译时遇到,则在编译文件的其余部分之前立即运行。

UNITCHECK:定义这些模块的单元完成编译后就运行。主程序文件和它的加载的各个模块都是编译单元,字符串eval、正则表达式中使用(?{})和(??{})构造编译的代码、do FILE和require FILE调用,以及命令行上-e开关后面的代码也都是编译单元。你可能更希望使用这一类模块来运行初始化代码而不是INIT。

CHECK:在编译完成之后 但在程序开始之前运行(CHECK可能表示“检查点”或“双重检查”,或者甚至表示“停止”)。

INIT:在程序主流程即将开始之前运行。

END:在程序完成之后运行。

如果以上代码块声明多次(同名),即使在不同的模块中,BEGIN都会在CHECK之前运行,CHECK在INIT之前运行,INIT都在END之前运行。当然END都在主程序完成之后最后运行。多个BEGIN和INIT会按照声明顺序(FIFO)运行,CHECK和END则按声明的相反顺序(LIFO)运行。

---------------------------------------------------------------
BEGIN块

BEGIN块是在程序编译期间执行的,也就是上面的步骤(1)所在期间:即使程序中出现了语法错误,BEGIN块也会执行;如果出现了多个BEGIN块,则按照FIFO(first in first out)的方式输出,也就是从上到下的顺序。

在BEGIN期间可以做一些程序执行之前的操作,例如事先给某个比较特殊的变量赋值,检查文件是否存在,检查操作系统是否满足要求等等。

package Foo;
use strict;
use warnings;
BEGIN {
    print "This is the first BEGIN blockn";
}
print "The program is running";
BEGIN {
    print "This is the second BEGIN blockn";
}

由于BEGIN代码块在编译期间执行,程序普通行的print是在执行期间执行,所以上面的代码结果为:
This is the first BEGIN block
This is the second BEGIN block
The program is running

下面程序出现语法错误,但BEGIN也会执行:
BEGIN {
    print "This is the first BEGIN blockn";
}
print "The program is running";
BEGIN {
    print "This is the second BEGIN blockn";
}
my $x =;

执行结果:
syntax error at some_program.pl line 8, near "=;"
Execution of some_program.pl aborted due to compilation errors.
This is the first BEGIN block
This is the second BEGIN block

不过上面的error信息不一定会最先输出,因为stdout和stderr是两个独立的文件句柄,无法保证它们之间的顺序。实际上,use导入模块时如果导入的是空列表,它等价于在BEGIN中使用require语句:
use File::Find ();
# 等价于
BEGIN {
    require File::Find;
}

---------------------------------------------------------------
END块

END块是在程序执行结束,但退出前执行的,也就是上面的步骤(3)所在期间。

当触发了die的时候,它们也会执行,但可以通过信号来忽略END,它们的执行顺序是LIFO(last in first out),即从下往上输出。END常用来做清理、善后操作。

END {
    print "This is the first END blockn";
}
END {
    print "This is the second END blockn";
}

输出结果:注意,先输出second END,再输出first END,原因见上。
This is the second END block
This is the first END block

在一个END块中,$?包含程序退出时的状态,可以在END块中修改$?来改变程序的退出值。

---------------------------------------------------------------
INIT、CHECK 和 UNITCHECK 块

INIT、CHECK 和 UNITCHECK 块生效于程序编译结束之后、执行之前。所以如果语法错误,它们不会被触发。

CHECK在编译结束后立即执行,即上面步骤(1)刚完成之后,输出格式为LIFO

INIT紧跟在CHECK之后,也是在步骤(2)之前,输出格式为FIFO

UNITCHECK是在Perl 5.9.5之后引入的功能。用于解决执行期间require或do或eval导入文件时不触发CHECK和INIT的问题(因为这些语句的导入是在执行期间进行的,而这两个块是在编译期间进行的)。UNITCHECK是在导入的文件刚编译完成之后、执行之前立即执行的。

UNITCHECK, CHECK and INIT blocks are useful to catch the transition between compilation and execution phase of the main program and to perform some checks or initialisation, after compilation and before execution.

INIT {
    print "This is the first INIT blockn";
}
CHECK {
    print "This is the first CHECK blockn";
}
INIT {
    print "This is the second INIT blockn";
}
CHECK {
    print "This is the second CHECK blockn";
}

输出结果:
This is the second CHECK block
This is the first CHECK block
This is the first INIT block
This is the second INIT block


Perl 5.32的手册页(perlmod)中对这5个代码块的说明及示例
---------------------------------------------------------------
BEGIN, UNITCHECK, CHECK, INIT and END

Five specially named code blocks are executed at the beginning and at the end of a running Perl program. These are the BEGIN, UNITCHECK, CHECK, INIT, and END blocks.

These code blocks can be prefixed with sub to give the appearance of a subroutine (although this is not considered good style). One should note that these code blocks don't really exist as named subroutines (despite their appearance). The thing that gives this away is the fact that you can have more than one of these code blocks in a program, and they will get all executed at the appropriate moment. So you can't execute any of these code blocks by name.

A BEGIN code block is executed as soon as possible, that is, the moment it is completely defined, even before the rest of the containing file (or string) is parsed. You may have multiple BEGIN blocks within a file (or eval'ed string); they will execute in order of definition. Because a BEGIN code block executes immediately, it can pull in definitions of subroutines and such from other files in time to be visible to the rest of the compile and run time. Once a BEGIN has run, it is immediately undefined and any code it used is returned to Perl's memory pool.

An END code block is executed as late as possible, that is, after perl has finished running the program and just before the interpreter is being exited, even if it is exiting as a result of a die() function. (But not if it's morphing into another program via exec, or being blown out of the water by a signal--you have to trap that yourself (if you can).) You may have multiple END blocks within a file--they will execute in reverse order of definition; that is: last in, first out (LIFO). END blocks are not executed when you run perl with the -c switch, or if compilation fails.

Note that END code blocks are not executed at the end of a string eval(): if any END code blocks are created in a string eval(), they will be executed just as any other END code block of that package in LIFO order just before the interpreter is being exited.

Inside an END code block, $? contains the value that the program is going to pass to exit(). You can modify $? to change the exit value of the program. Beware of changing $? by accident (e.g. by running something via system).

Inside of a END block, the value of ${^GLOBAL_PHASE} will be "END".

UNITCHECK, CHECK and INIT code blocks are useful to catch the transition between the compilation phase and the execution phase of the main program.

UNITCHECK blocks are run just after the unit which defined them has been compiled. The main program file and each module it loads are compilation units, as are string evals, run-time code compiled using the (?{ }) construct in a regex, calls to do FILE, require FILE, and code after the -e switch on the command line.

BEGIN and UNITCHECK blocks are not directly related to the phase of the interpreter. They can be created and executed during any phase.

CHECK code blocks are run just after the initial Perl compile phase ends and before the run time begins, in LIFO order. CHECK code blocks are used in the Perl compiler suite to save the compiled state of the program.

Inside of a CHECK block, the value of ${^GLOBAL_PHASE} will be "CHECK".

INIT blocks are run just before the Perl runtime begins execution, in "first in, first out" (FIFO) order.

Inside of an INIT block, the value of ${^GLOBAL_PHASE} will be "INIT".

The CHECK and INIT blocks in code compiled by require, string do, or string eval will not be executed if they occur after the end of the main compilation phase; that can be a problem in mod_perl and other persistent environments which use those functions to load code at runtime.

When you use the -n and -p switches to Perl, BEGIN and END work just as they do in awk, as a degenerate case. Both BEGIN and CHECK blocks are run when you use the -c switch for a compile-only syntax check, although your main code is not.

The begincheck program makes it all clear, eventually:

#!/usr/bin/perl
# begincheck
print "10. Ordinary code runs at runtime.\n";

END { print "16. So this is the end of the tale.\n" }
INIT { print " 7. INIT blocks run FIFO just before runtime.\n" }
UNITCHECK {
  print " 4. And therefore before any CHECK blocks.\n"
}
CHECK { print " 6. So this is the sixth line.\n" }

print "11. It runs in order, of course.\n";

BEGIN { print " 1. BEGIN blocks run FIFO during compilation.\n" }
END { print "15. Read perlmod for the rest of the story.\n" }
CHECK { print " 5. CHECK blocks run LIFO after all compilation.\n" }
INIT { print  " 8. Run this again, using Perl's -c switch.\n" }

print "12. This is anti-obfuscated code.\n";

END { print "14. END blocks run LIFO at quitting time.\n" }
BEGIN { print " 2. So this line comes out second.\n" }
UNITCHECK {
 print " 3. UNITCHECK blocks run LIFO after each file is compiled.\n"
}
INIT { print  " 9. You'll see the difference right away.\n" }

print "13. It only _looks_ like it should be confusing.\n";

__END__

---------------------------------------------------------------