Perl 5.12
2010-04-16 21:32:48 阿炯

Perl 开发团队已经发布了 Perl 5.12.0 版本。Perl 5.12.0 主要改进了 Unicode 支持、添加了一些新的 API、解决了 Y2038 问题、新增了 yada yada 操作符等等。Perl是一种脚本语言。 最初的设计者为拉里·沃尔(Larry Wall),它於1987年12月18日發表。Perl借取了C、sed、awk、shell scripting 以及很多其他程式語言的特性。其中最重要的特性是他内部集成了正则表达式的功能,以及巨大的第三方代码库 CPAN。

NAME
perl5120delta - what is new for perl v5.12.0

DESCRIPTION
This document describes differences between the 5.10.0 release and the 5.12.0 release.
Many of the bug fixes in 5.12.0 are already included in the 5.10.1 maintenance release.
You can see the list of those changes in the 5.10.1 release notes (perl5101delta).

Core Enhancements
New package NAME VERSION syntax
This new syntax allows a module author to set the $VERSION of a namespace when the namespace is declared with 'package'. It eliminates the need for our $VERSION = ... and similar constructs. E.g.

package Foo::Bar 1.23;
# $Foo::Bar::VERSION == 1.23

There are several advantages to this:
* $VERSION is parsed in exactly the same way as use NAME VERSION
* $VERSION is set at compile time
* $VERSION is a version object that provides proper overloading of comparison operators so comparing $VERSION to decimal (1.23) or dotted-decimal (v1.2.3) version numbers works correctly.
* Eliminates $VERSION = ... and eval $VERSION clutter
* As it requires VERSION to be a numeric literal or v-string literal, it can be statically parsed by toolchain modules without eval the way MM->parse_version does for $VERSION = ...

It does not break old code with only package NAME, but code that uses package NAME VERSION will need to be restricted to perl 5.12.0 or newer This is analogous to the change to open from two-args to three-args. Users requiring the latest Perl will benefit, and perhaps after several years, it will become a standard practice.

However, package NAME VERSION requires a new, 'strict' version number format. See "Version number formats" for details.
The ... operator

A new operator, ..., nicknamed the Yada Yada operator, has been added. It is intended to mark placeholder code that is not yet implemented. See "Yada Yada Operator" in perlop.

Implicit strictures
Using the use VERSION syntax with a version number greater or equal to 5.11.0 will lexically enable strictures just like use strict would do (in addition to enabling features.) The following:

use 5.12.0;
means:
use strict;
use feature ':5.12';

Unicode improvements
Perl 5.12 comes with Unicode 5.2, the latest version available to us at the time of release. This version of Unicode was released in October 2009. See http://www.unicode.org/versions/Unicode5.2.0 for further details about what's changed in this version of the standard. See perlunicode for instructions on installing and using other versions of Unicode.Additionally, Perl's developers have significantly improved Perl's Unicode implementation. For full details, see "Unicode overhaul" below.

Y2038 compliance
Perl's core time-related functions are now Y2038 compliant. (It may not mean much to you, but your kids will love it!)
qr overloading

It is now possible to overload the qr// operator, that is, conversion to regexp, like it was already possible to overload conversion to boolean, string or number of objects. It is invoked when an object appears on the right hand side of the =~ operator or when it is interpolated into a regexp. See overload.

Pluggable keywords
Extension modules can now cleanly hook into the Perl parser to define new kinds of keyword-headed expression and compound statement. The syntax following the keyword is defined entirely by the extension. This allow a completely non-Perl sublanguage to be parsed inline, with the correct ops cleanly generated.

See "PL_keyword_plugin" in perlapi for the mechanism. The Perl core source distribution also includes a new module XS::APItest::KeywordRPN, which implements reverse Polish notation arithmetic via pluggable keywords. This module is mainly used for test purposes, and is not normally installed, but also serves as an example of how to use the new mechanism.

Perl's developers consider this feature to be experimental. We may remove it or change it in a backwards-incompatible way in Perl 5.14.

APIs for more internals
The lowest layers of the lexer and parts of the pad system now have C APIs available to XS extensions. These are necessary to support proper use of pluggable keywords, but have other uses too. The new APIs are experimental, and only cover a small proportion of what would be necessary to take full advantage of the core's facilities in these areas. It is intended that the Perl 5.13 development cycle will see the addition of a full range of clean, supported interfaces.

Perl's developers consider this feature to be experimental. We may remove it or change it in a backwards-incompatible way in Perl 5.14.

Overridable function lookup
Where an extension module hooks the creation of rv2cv ops to modify the subroutine lookup process, this now works correctly for bareword subroutine calls. This means that prototypes on subroutines referenced this way will be processed correctly. (Previously bareword subroutine names were initially looked up, for parsing purposes, by an unhookable mechanism, so extensions could only properly influence subroutine names that appeared with an & sigil.)
A proper interface for pluggable Method Resolution Orders

As of Perl 5.12.0 there is a new interface for plugging and using method resolution orders other than the default linear depth first search. The C3 method resolution order added in 5.10.0 has been re-implemented as a plugin, without changing its Perl-space interface. See perlmroapi for more information.

\N experimental regex escape
Perl now supports \N, a new regex escape which you can think of as the inverse of \n. It will match any character that is not a newline, independently from the presence or absence of the single line match modifier /s. It is not usable within a character class. \N{3} means to match 3 non-newlines; \N{5,} means to match at least 5. \N{NAME} still means the character or sequence named NAME, but NAME no longer can be things like 3, or 5,.

This will break a custom charnames translator which allows numbers for character names, as \N{3} will now mean to match 3 non-newline characters, and not the character whose name is 3. (No name defined by the Unicode standard is a number, so only custom translators might be affected.)

Perl's developers are somewhat concerned about possible user confusion with the existing \N{...} construct which matches characters by their Unicode name. Consequently, this feature is experimental. We may remove it or change it in a backwards-incompatible way in Perl 5.14.

DTrace support
Perl now has some support for DTrace. See "DTrace support" in INSTALL.

Support for configure_requires in CPAN module metadata
Both CPAN and CPANPLUS now support the configure_requires keyword in the META.yml metadata file included in most recent CPAN distributions. This allows distribution authors to specify configuration prerequisites that must be installed before running Makefile.PL or Build.PL.

See the documentation for ExtUtils::MakeMaker or Module::Build for more on how to specify configure_requires when creating a distribution for CPAN.
each is now more flexible

The each function can now operate on arrays.
when as a statement modifier

when is now allowed to be used as a statement modifier.
$, flexibility

The variable $, may now be tied.
// in when clauses

// now behaves like || in when clauses
Enabling warnings from your shell environment

You can now set -W from the PERL5OPT environment variable
delete local

delete local now allows you to locally delete a hash entry.
New support for Abstract namespace sockets

Abstract namespace sockets are Linux-specific socket type that live in AF_UNIX family, slightly abusing it to be able to use arbitrary character arrays as addresses: They start with nul byte and are not terminated by nul byte, but with the length passed to the socket() system call.
32-bit limit on substr arguments removed

The 32-bit limit on substr arguments has now been removed. The full range of the system's signed and unsigned integers is now available for the pos and len arguments.
Potentially Incompatible Changes

Deprecations warn by default
Over the years, Perl's developers have deprecated a number of language features for a variety of reasons. Perl now defaults to issuing a warning if a deprecated language feature is used. Many of the deprecations Perl now warns you about have been deprecated for many years. You can find a list of what was deprecated in a given release of Perl in the perl5xxdelta.pod file for that release.

To disable this feature in a given lexical scope, you should use no warnings 'deprecated'; For information about which language features are deprecated and explanations of various deprecation warnings, please see perldiag.pod. See "Deprecations" below for the list of features and modules Perl's developers have deprecated as part of this release.

Version number formats
Acceptable version number formats have been formalized into "strict" and "lax" rules. package NAME VERSION takes a strict version number. UNIVERSAL::VERSION and the version object constructors take lax version numbers. Providing an invalid version will result in a fatal error. The version argument in use NAME VERSION is first parsed as a numeric literal or v-string and then passed to UNIVERSAL::VERSION (and must then pass the "lax" format test).

These formats are documented fully in the version module. To a first approximation, a "strict" version number is a positive decimal number (integer or decimal-fraction) without exponentiation or else a dotted-decimal v-string with a leading 'v' character and at least three components. A "lax" version number allows v-strings with fewer than three components or without a leading 'v'. Under "lax" rules, both decimal and dotted-decimal versions may have a trailing "alpha" component separated by an underscore character after a fractional or dotted-decimal component.

The version module adds version::is_strict and version::is_lax functions to check a scalar against these rules.

@INC reorganization
In @INC, ARCHLIB and PRIVLIB now occur after after the current version's site_perl and vendor_perl. Modules installed into site_perl and vendor_perl will now be loaded in preference to those installed in ARCHLIB and PRIVLIB.
REGEXPs are now first class

Internally, Perl now treates compiled regular expressions (such as those created with qr//) as first class entities. Perl modules which serialize, deserialize or otherwise have deep interaction with Perl's internal data structures need to be updated for this change. Most affected CPAN modules have already been updated as of this writing.
Switch statement changes

The given/when switch statement handles complex statements better than Perl 5.10.0 did (These enhancements are also available in 5.10.1 and subsequent 5.10 releases.) There are two new cases where when now interprets its argument as a boolean, instead of an expression to be used in a smart match:

flip-flop operators
The .. and ... flip-flop operators are now evaluated in boolean context, following their usual semantics; see "Range Operators" in perlop.

Note that, as in perl 5.10.0, when (1..10) will not work to test whether a given value is an integer between 1 and 10; you should use when ([1..10]) instead (note the array reference).

However, contrary to 5.10.0, evaluating the flip-flop operators in boolean context ensures it can now be useful in a when(), notably for implementing bistable conditions, like in:

when (/^=begin/ .. /^=end/) {
# do something
}

defined-or operator
A compound expression involving the defined-or operator, as in when (expr1 // expr2), will be treated as boolean if the first expression is boolean. (This just extends the existing rule that applies to the regular or operator, as in when (expr1 || expr2).)

Smart match changes
Since Perl 5.10.0, Perl's developers have made a number of changes to the smart match operator. These, of course, also alter the behaviour of the switch statements where smart matching is implicitly used. These changes were also made for the 5.10.1 release, and will remain in subsequent 5.10 releases.

Changes to type-based dispatch
The smart match operator ~~ is no longer commutative. The behaviour of a smart match now depends primarily on the type of its right hand argument. Moreover, its semantics have been adjusted for greater consistency or usefulness in several cases. While the general backwards compatibility is maintained, several changes must be noted:

* Code references with an empty prototype are no longer treated specially. They are passed an argument like the other code references (even if they choose to ignore it).
* %hash ~~ sub {} and @array ~~ sub {} now test that the subroutine returns a true value for each key of the hash (or element of the array), instead of passing the whole hash or array as a reference to the subroutine.
* Due to the commutativity breakage, code references are no longer treated specially when appearing on the left of the ~~ operator, but like any vulgar scalar.
* undef ~~ %hash is always false (since undef can't be a key in a hash). No implicit conversion to "" is done (as was the case in perl 5.10.0).
* $scalar ~~ @array now always distributes the smart match across the elements of the array. It's true if one element in @array verifies $scalar ~~ $element. This is a generalization of the old behaviour that tested whether the array contained the scalar.

The full dispatch table for the smart match operator is given in "Smart matching in detail" in perlsyn.

Smart match and overloading
According to the rule of dispatch based on the rightmost argument type, when an object overloading ~~ appears on the right side of the operator, the overload routine will always be called (with a 3rd argument set to a true value, see overload.) However, when the object will appear on the left, the overload routine will be called only when the rightmost argument is a simple scalar. This way, distributivity of smart match across arrays is not broken, as well as the other behaviours with complex types (coderefs, hashes, regexes). Thus, writers of overloading routines for smart match mostly need to worry only with comparing against a scalar, and possibly with stringification overloading; the other common cases will be automatically handled consistently.

~~ will now refuse to work on objects that do not overload it (in order to avoid relying on the object's underlying structure). (However, if the object overloads the stringification or the numification operators, and if overload fallback is active, it will be used instead, as usual.)
Other potentially incompatible changes

* The definitions of a number of Unicode properties have changed to match those of the current Unicode standard. These are listed above under "Unicode overhaul". This change may break code that expects the old definitions.
* The boolkeys op has moved to the group of hash ops. This breaks binary compatibility.
* Filehandles are now always blessed into IO::File.

The previous behaviour was to bless Filehandles into FileHandle (an empty proxy class) if it was loaded into memory and otherwise to bless them into IO::Handle.
* The semantics of use feature :5.10* have changed slightly. See "Modules and Pragmata" for more information.
* Perl's developers now use git, rather than Perforce. This should be a purely internal change only relevant to people actively working on the core. However, you may see minor difference in perl as a consequence of the change. For example in some of details of the output of perl -V. See perlrepository for more information.
* As part of the Test::Harness 2.x to 3.x upgrade, the experimental Test::Harness::Straps module has been removed. See "Modules and Pragmata" for more details.
* As part of the ExtUtils::MakeMaker upgrade, the ExtUtils::MakeMaker::bytes and ExtUtils::MakeMaker::vmsish modules have been removed from this distribution.
* Module::CoreList no longer contains the %:patchlevel hash.
* length undef now returns undef.
* Unsupported private C API functions are now declared "static" to prevent leakage to Perl's public API.
* To support the bootstrapping process, miniperl no longer builds with UTF-8 support in the regexp engine.

This allows a build to complete with PERL_UNICODE set and a UTF-8 locale. Without this there's a bootstrapping problem, as miniperl can't load the UTF-8 components of the regexp engine, because they're not yet built.
* miniperl's @INC is now restricted to just -I..., the split of $ENV{PERL5LIB}, and "."
* A space or a newline is now required after a "#line XXX" directive.
* Tied filehandles now have an additional method EOF which provides the EOF type.
* To better match all other flow control statements, foreach may no longer be used as an attribute.
* Perl's command-line switch "-P", which was deprecated in version 5.10.0, has now been removed.

Deprecations
From time to time, Perl's developers find it necessary to deprecate features or modules we've previously shipped as part of the core distribution. We are well aware of the pain and frustration that a backwards-incompatible change to Perl can cause for developers building or maintaining software in Perl. You can be sure that when we deprecate a functionality or syntax, it isn't a choice we make lightly. Sometimes, we choose to deprecate functionality or syntax because it was found to be poorly designed or implemented. Sometimes, this is because they're holding back other features or causing performance problems. Sometimes, the reasons are more complex. Wherever possible, we try to keep deprecated functionality available to developers in its previous form for at least one major release. So long as a deprecated feature isn't actively disrupting our ability to maintain and extend Perl, we'll try to leave it in place as long as possible.

The following items are now deprecated:

suidperl
suidperl is no longer part of Perl. It used to provide a mechanism to emulate setuid permission bits on systems that don't support it properly.
Use of := to mean an empty attribute list

An accident of Perl's parser meant that these constructions were all equivalent:
my $pi := 4;
my $pi : = 4;
my $pi :  = 4;

with the : being treated as the start of an attribute list, which ends before the =. As whitespace is not significant here, all are parsed as an empty attribute list, hence all the above are equivalent to, and better written as

my $pi = 4;

because no attribute processing is done for an empty list.

As is, this meant that := cannot be used as a new token, without silently changing the meaning of existing code. Hence that particular form is now deprecated, and will become a syntax error. If it is absolutely necessary to have empty attribute lists (for example, because of a code generator) then avoid the warning by adding a space before the =.
UNIVERSAL->import()

The method UNIVERSAL->import() is now deprecated. Attempting to pass import arguments to a use UNIVERSAL statement will result in a deprecation warning.
Use of "goto" to jump into a construct

Using goto to jump from an outer scope into an inner scope is now deprecated. This rare use case was causing problems in the implementation of scopes.
Custom character names in \N{name} that don't look like names

In \N{name}, name can be just about anything. The standard Unicode names have a very limited domain, but a custom name translator could create names that are, for example, made up entirely of punctuation symbols. It is now deprecated to make names that don't begin with an alphabetic character, and aren't alphanumeric or contain other than a very few other characters, namely spaces, dashes, parentheses and colons. Because of the added meaning of \N (See "\N experimental regex escape"), names that look like curly brace -enclosed quantifiers won't work. For example, \N{3,4} now means to match 3 to 4 non-newlines; before a custom name 3,4 could have been created.

Deprecated Modules
The following modules will be removed from the core distribution in a future release, and should be installed from CPAN instead. Distributions on CPAN which require these should add them to their prerequisites. The core versions of these modules warnings will issue a deprecation warning.

If you ship a packaged version of Perl, either alone or as part of a larger system, then you should carefully consider the reprecussions of core module deprecations. You may want to consider shipping your default build of Perl with packages for some or all deprecated modules which install into vendor or site perl library directories. This will inhibit the deprecation warnings.

Alternatively, you may want to consider patching lib/deprecate.pm to provide deprecation warnings specific to your packaging system or distribution of Perl, consistent with how your packaging system or distribution manages a staged transition from a release where the installation of a single package provides the given functionality, to a later release where the system administrator needs to know to install multiple packages to get that same functionality.

You can silence these deprecation warnings by installing the modules in question from CPAN. To install the latest version of all of them, just install Task::Deprecations::5_12.

Class::ISA
Pod::Plainer
Shell
Switch

Switch is buggy and should be avoided. You may find Perl's new given/when feature a suitable replacement. See "Switch statements" in perlsyn for more information.

Assignment to $[
Use of the attribute :locked on subroutines
Use of "locked" with the attributes pragma
Use of "unique" with the attributes pragma
Perl_pmflag

Perl_pmflag is no longer part of Perl's public API. Calling it now generates a deprecation warning, and it will be removed in a future release. Although listed as part of the API, it was never documented, and only ever used in toke.c, and prior to 5.10, regcomp.c. In core, it has been replaced by a static function.

Numerous Perl 4-era libraries
termcap.pl, tainted.pl, stat.pl, shellwords.pl, pwd.pl, open3.pl, open2.pl, newgetopt.pl, look.pl, find.pl, finddepth.pl, importenv.pl, hostname.pl, getopts.pl, getopt.pl, getcwd.pl, flush.pl, fastcwd.pl, exceptions.pl, ctime.pl, complete.pl, cacheout.pl, bigrat.pl, bigint.pl, bigfloat.pl, assert.pl, abbrev.pl, dotsh.pl, and timelocal.pl are all now deprecated. Earlier, Perl's developers intended to remove these libraries from Perl's core for the 5.14.0 release.

During final testing before the release of 5.12.0, several developers discovered current production code using these ancient libraries, some inside the Perl core itself. Accordingly, the pumpking granted them a stay of execution. They will begin to warn about their deprecation in the 5.14.0 release and will be removed in the 5.16.0 release.

Unicode overhaul
Perl's developers have made a concerted effort to update Perl to be in sync with the latest Unicode standard. Changes for this include:

Perl can now handle every Unicode character property. New documentation, perluniprops, lists all available non-Unihan character properties. By default, perl does not expose Unihan, deprecated or Unicode-internal properties. See below for more details on these; there is also a section in the pod listing them, and explaining why they are not exposed.

Perl now fully supports the Unicode compound-style of using = and : in writing regular expressions: \p{property=value} and \p{property:value} (both of which mean the same thing).

Perl now fully supports the Unicode loose matching rules for text between the braces in \p{...} constructs. In addition, Perl allows underscores between digits of numbers.

Perl now accepts all the Unicode-defined synonyms for properties and property values.

qr/\X/, which matches a Unicode logical character, has been expanded to work better with various Asian languages. It now is defined as an extended grapheme cluster. (See http://www.unicode.org/reports/tr29/). Anything matched previously and that made sense will continue to be accepted. Additionally:

* \X will not break apart a CR LF sequence.
* \X will now match a sequence which includes the ZWJ and ZWNJ characters.
* \X will now always match at least one character, including an initial mark. Marks generally come after a base character, but it is possible in Unicode to have them in isolation, and \X will now handle that case, for example at the beginning of a line, or after a ZWSP. And this is the part where \X doesn't match the things that it used to that don't make sense. Formerly, for example, you could have the nonsensical case of an accented LF.
* \X will now match a (Korean) Hangul syllable sequence, and the Thai and Lao exception cases.

Otherwise, this change should be transparent for the non-affected languages.

\p{...} matches using the Canonical_Combining_Class property were completely broken in previous releases of Perl. They should now work correctly.

Before Perl 5.12, the Unicode Decomposition_Type=Compat property and a Perl extension had the same name, which led to neither matching all the correct values (with more than 100 mistakes in one, and several thousand in the other). The Perl extension has now been renamed to be Decomposition_Type=Noncanonical (short: dt=noncanon). It has the same meaning as was previously intended, namely the union of all the non-canonical Decomposition types, with Unicode Compat being just one of those.

\p{Decomposition_Type=Canonical} now includes the Hangul syllables.

\p{Uppercase} and \p{Lowercase} now work as the Unicode standard says they should. This means they each match a few more characters than they used to.

\p{Cntrl} now matches the same characters as \p{Control}. This means it no longer will match Private Use (gc=co), Surrogates (gc=cs), nor Format (gc=cf) code points. The Format code points represent the biggest possible problem. All but 36 of them are either officially deprecated or strongly discouraged from being used. Of those 36, likely the most widely used are the soft hyphen (U+00AD), and BOM, ZWSP, ZWNJ, WJ, and similar characters, plus bidirectional controls.

\p{Alpha} now matches the same characters as \p{Alphabetic}. Before 5.12, Perl's definition definition included a number of things that aren't really alpha (all marks) while omitting many that were. The definitions of \p{Alnum} and \p{Word} depend on Alpha's definition and have changed accordingly.

\p{Word} no longer incorrectly matches non-word characters such as fractions.

\p{Print} no longer matches the line control characters: Tab, LF, CR, FF, VT, and NEL. This brings it in line with standards and the documentation.

\p{XDigit} now matches the same characters as \p{Hex_Digit}. This means that in addition to the characters it currently matches, [A-Fa-f0-9], it will also match the 22 fullwidth equivalents, for example U+FF10: FULLWIDTH DIGIT ZERO.

The Numeric type property has been extended to include the Unihan characters.

There is a new Perl extension, the 'Present_In', or simply 'In', property. This is an extension of the Unicode Age property, but \p{In=5.0} matches any code point whose usage has been determined as of Unicode version 5.0. The \p{Age=5.0} only matches code points added in precisely version 5.0.

A number of properties now have the correct values for unassigned code points. The affected properties are Bidi_Class, East_Asian_Width, Joining_Type, Decomposition_Type, Hangul_Syllable_Type, Numeric_Type, and Line_Break.

The Default_Ignorable_Code_Point, ID_Continue, and ID_Start properties are now up to date with current Unicode definitions.

Earlier versions of Perl erroneously exposed certain properties that are supposed to be Unicode internal-only. Use of these in regular expressions will now generate, if enabled, a deprecation warning message. The properties are: Other_Alphabetic, Other_Default_Ignorable_Code_Point, Other_Grapheme_Extend, Other_ID_Continue, Other_ID_Start, Other_Lowercase, Other_Math, and Other_Uppercase.

It is now possible to change which Unicode properties Perl understands on a per-installation basis. As mentioned above, certain properties are turned off by default. These include all the Unihan properties (which should be accessible via the CPAN module Unicode::Unihan) and any deprecated or Unicode internal-only property that Perl has never exposed.

The generated files in the lib/unicore/To directory are now more clearly marked as being stable, directly usable by applications. New hash entries in them give the format of the normal entries, which allows for easier machine parsing. Perl can generate files in this directory for any property, though most are suppressed. You can find instructions for changing which are written in perluniprops.
Modules and Pragmata
New Modules and Pragmata

autodie
autodie is a new lexically-scoped alternative for the Fatal module. The bundled version is 2.06_01. Note that in this release, using a string eval when autodie is in effect can cause the autodie behaviour to leak into the surrounding scope. See "BUGS" in autodie for more details.
--------------
Perl 5.12.2 发布
该版本修复了处理 “no VERSION” 的bug;同时修复了核心模块上一些bug;更新了文档。
--------------

该文章最后由 阿炯 于 2012-06-11 09:29:50 更新,目前是第 3 版。