Hi
Doing again some XML after a long time and trying out XML::Twig
That's example code running on a node from HaukeX
I was looking for a more generic way that writing handlers for each tag and found the ->simplify method, which looks good enough for that task. (yeah I know XML::Simple is evil but so seems the monasteries output too ;-p )
use strict;
use warnings;
use Data::Dump qw/pp dd/;
my $data= join "", <DATA>;
use XML::Twig;
$\="\n";
print "=== HANDLER:\n";
my $twig=XML::Twig->new(
twig_handlers =>
{
'field[@name="doctext"]' => sub {
print $_->gi,"Post: ",$_->child_text(0)
},
'author' => sub {
print "ID: ", $_->att("id");
print "Name: ", $_->child_trimmed_text(0);
},
},
);
$twig->parse($data);
print "=== SIMPLIFIED:\n";
$twig=XML::Twig->new();
print pp $twig->parse( $data)->simplify();
__DATA__
<?xml version="1.0" encoding="Windows-1252"?>
<node id="11100665" title="Re^5: What does $_ = qq~"$_"~ do?
+" created="2019-05-28 16:28:57" updated="2019-05-28 16:28:57">
<type id="11">
note</type>
<author id="830549">
haukex</author>
<data>
<field name="doctext">
<p>More fun facts! I once wrote a script to search a word list f
+or words that make valid regexen which convert one valid word into an
+other.</p>
<c>
$ perl -le 'print bangs =~s engender'
bands
$ perl -le 'print halved =~s avatar'
halted
$ perl -le 'print stove =~s evener'
stone
</c>
</field>
<field name="root_node">
11100593</field>
<field name="parent_node">
11100640</field>
<field name="reputation">
21</field>
</data>
</node>
what I don't like are the leading newlines in many content fields, like in content => "\nhaukex"
=== HANDLER:
ID: 830549
Name: haukex
fieldPost:
<p>More fun facts! I once wrote a script to search a word list for wor
+ds that make valid regexen which convert one valid word into another.
+</p>
<c>
$ perl -le 'print bangs =~s engender'
bands
$ perl -le 'print halved =~s avatar'
halted
$ perl -le 'print stove =~s evener'
stone
</c>
=== SIMPLIFIED:
{
author => { 830549 => { content => "\nhaukex" } },
created => "2019-05-28 16:28:57",
data => {
field => {
doctext => {
content => "\n<p>More fun facts! I o
+nce wrote a script to search a word list for words that make valid re
+gexen which convert one valid word into another.</p>\n<c>\n\$ perl -l
+e 'print bangs =~s engender'\nbands\n\$ perl -le 'print halved =~s av
+atar'\nhalted\n\$ perl -le 'print stove =~s evener'\nstone\n</c>\n",
},
parent_node => { content => "\n11100640" },
reputation => { content => "\n21" },
root_node => { content => "\n11100593" },
},
},
title => "Re^5: What does \$_ = qq~\"\$_\"~ do?",
type => { 11 => { content => "\nnote" } },
updated => "2019-05-28 16:28:57",
}
I couldn't find an option for ->simplify(%options) to trim the content.
I had to use child_trimmed_text(0) when writing handlers....
Question:
- do I have to write a handler to rewrite the contents?
- do the newlines serve any purpose or is it a limitation from XML::Fling (no link, couldn't find it on CPAN) ?
- ->child_trimmed_text(0) only worked with condition 0! Why?