Perl XML::Simple模块使用示例
在Perl中解析XML的方法最常见的就是使用 XML::DOM 和 XML::Simple了。XML::DOM过于庞大,而且解析结果是一个DOM树,操作也不方便,尤其对于小型的且不复杂的XML文件。这时就轮到轻量的XML::Simple派上用场了。对Xml的解析也比较简单:
use v5.12;
use XML::Simple;
use Data::Dumper;
$Data::Dumper::Indent=1;
my $xml = XMLin('freeoa.xml');
say Dumper($xml);
freeoa.xml
<opt>
<user login="geps" fullname="Gary Epstein" />
<user login="stty" fullname="Simon Tyson" >
<session pid="12345"/>
</user>
<text>This is a FreeOA.</text>
</opt>
$VAR1 = {
'text' => 'This is a FreeOA.',
'user' => [
{
'fullname' => 'Gary Epstein',
'login' => 'geps'
},
{
'session' => {
'pid' => '12345'
},
'fullname' => 'Simon Tyson',
'login' => 'stty'
}
]
};
这样就可以轻而易举地将XML解析成一个hash,然后用foreach依次处理即可,从输出的结构可以看到:
元素的标签名被用于hash的key。
单个元素的内容作为hash的value,多个重复的元素的内容被放到一个数组引用中作为hash的value
属性和子元素都以hash的key=>value对出现在元素的内容中
另外一个问题是,对单个元素和多个重复元素的处理结果不一致,就会导致foreach处理时比较麻烦 (需要区分是标量还是数组引用),如上面的 text 和 user 的值。 解决方法是添加选项 ForceArray => 1,就可以强制单个元素也放到数组引用中。
$xml = XMLin('freeoa.xml', ForceArray => 1);
say Dumper($xml);
$VAR1 = {
'text' => [
'This is a FreeOA.'
],
{
'session' => [
{
'pid' => '12345'
}
],
}
};
还有一个问题是,如果你的元素属性中包含id、name或key,那么元素就不再放到数组引用中,而是放到hash引用中。比如下面的XML,注意与上面的结果的区别:
<opt>
<user id="geps" fullname="Gary Epstein" />
<user id="stty" fullname="Simon Tyson">
<session pid="12345"/>
</user>
<text>This is a FreeOA.</text>
</opt>
$VAR1 = {
'text' => [
'This is a FreeOA.'
],
'user' => {
'geps' => {
'fullname' => 'Gary Epstein'
},
'stty' => {
'session' => [
{
'pid' => '12345'
}
],
'fullname' => 'Simon Tyson'
}
}
};
user的内容不再是数组引用,而是hash引用,而id='geps'也变成了key存在。想要禁用这个功能,应当指定选项 KeyAttr => ''。这个选项的意义:解析时应该把哪些属性作为hash的key来使用,默认值是[‘id’, ‘name’, ‘key’]。
my $xml = XMLin('freeoa.xml',ForceArray => 1,KeyAttr => '');
$VAR1 = {
'text' => [
'This is a FreeOA.'
],
'user' => [
{
'fullname' => 'Gary Epstein',
'id' => 'geps'
},
{
'session' => [
{
'pid' => '12345'
}
],
'fullname' => 'Simon Tyson',
'id' => 'stty'
}
]
};
在XML::Simple的文档中, 所有的选项都有详细说明,而KeyAttr和ForceArray选项被标为important,可见它们是多么常用了。
始终将user作为array; 将user下的retries作为id中的key;
my $xml = XMLin('freeoa.xml',ForceArray => ["user"],KeyAttr => {"user" => "id"},);
$VAR1 = {
'text' => 'This is a FreeOA.',
'user' => {
'geps' => {
'fullname' => 'Gary Epstein'
},
'stty' => {
'session' => {
'pid' => '12345'
},
'fullname' => 'Simon Tyson'
}
}
};
上面展示了读入xml进行解析后按不同的方式来输出,下面来看看将perl变量输出为xml。
use v5.12;
use XML::Simple;
use Data::Dumper;
$Data::Dumper::Indent=1;
my $ds = {
book => [
{
id => 1,
title => [ "Programming Perl" ],
edition => [ 3 ],
},
{
id => 2,
title => [ "Perl and LWP" ],
edition => [ 1,2 ],
},
{
id => 3,
title => [ "Anonymous Perl" ],
edition => [ 1 ],
},
]
};
say XMLout($ds, RootName => "books" );
<books>
<book id="1">
<edition>3</edition>
<title>Programming Perl</title>
</book>
<book id="2">
<edition>1</edition>
<edition>2</edition>
<title>Perl and LWP</title>
</book>
<book id="3">
<edition>1</edition>
<title>Anonymous Perl</title>
</book>
</books>
如果不指定'RootName','books'将会被'opt'所取代。
The id entry in each hash became an attribute because the default behavior of XMLout is to do this for the id, key, and name fields. Prevent this with:
XMLout($ds, RootName => "books", KeyAttr => [ ]);
book的内容不再是数组引用,而是hash引用,想要禁用这个功能,应当指定选项 KeyAttr => ''。这个选项的意义:解析时应该把哪些属性作为hash的key来使用,默认值是[‘id’, ‘name’, ‘key’]。
my $ds = {
book => [
{
bid => 1,
title => [ "Programming Perl" ],
edition => [ 3 ],
author => 'hto'
},
{
bid => 2,
title => [ "Perl and LWP" ],
edition => [ 1,2 ],
},
{
bid => 3,
title => [ "Anonymous Perl" ],
edition => [ 1 ],
author => 'vivia'
},
]
};
say XMLout($ds, RootName => "books", KeyAttr => { "book" => "author" });
That instructs XMLout to create attributes only for the author field in a book hash.
<books>
<book author="hto" bid="1">
<edition>3</edition>
<title>Programming Perl</title>
</book>
<book bid="2">
<edition>1</edition>
<edition>2</edition>
<title>Perl and LWP</title>
</book>
<book author="vivia" bid="3">
<edition>1</edition>
<title>Anonymous Perl</title>
</book>
</books>
参考来源:
XML for Perl developers, Part 1 XML plus Perl -- simply magic
XML for Perl developers, Part 2 Advanced XML parsing techniques using Perl