corfuitl has asked for the wisdom of the Perl Monks concerning the following question:
Hi perlmoks
I have a TMX file which looks like this one
<?xml version="1.0" encoding="UTF-8"?> <tmx version="1.4"><header creationtool="xx" creationtoolversion="1" s +egtype="sentence" o-tmf="undefined" adminlang="en" srclang="en" datat +ype="undefined"></header><body> <tu changedate="20180321T113135Z" creationdate="20180321T113135Z" chan +geid="user" tuid="1"> <prop type="client"> </prop> <prop type="project"> </prop> <prop type="domain"> </prop> <prop type="subject"> </prop> <prop type="corrected">no</prop> <prop type="aligned">no</prop> <tuv xml:lang="en"><seg>Hello <b>world!</b></seg></tuv> <tuv xml:lang="fr"><seg>Bonjour <b> monde</b></seg></tuv> </tu> <tu changedate="20180321T113135Z" creationdate="20180321T113135Z" chan +geid="user2" tuid="2"> <prop type="client"> </prop> <prop type="project">yes</prop> <prop type="corrected">no</prop> <prop type="aligned">no</prop> <tuv xml:lang="en"><seg>Hello <b>world!</b></seg></tuv> <tuv xml:lang="fr"><seg>Bonjour <b> monde</b></seg></tuv> </tu> </body> </tmx>
and I would like to export all the information in one line (tab separated).
I have the following code to export en and fr segments but it is not possible to export all other attributes.
use XML::LibXML; my $dom = 'XML::LibXML'->load_xml(IO => *STDIN); for my $child ( @{ $dom->find('/tmx/body/tu/tuv[@xml:lang=\'en\']/seg | /tmx/body/ +tu/tuv[@xml:lang=\'fr\']/seg | tmx/body/tu/prop | /tmx/body/tu/@creat +iondate') } ) { ( my $contents = join '', $child->childNodes ) =~ s,\n, <lb/> ,g; print $contents, $child->nodeName eq 'source' ? "\t" : "\n"; }
The ideal scenario would be to whatever props there are in the nodes and align them.
Could you please help me improve the code and sort it out?
Thanks
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Strip XML document
by choroba (Cardinal) on Jul 10, 2018 at 16:32 UTC | |
by corfuitl (Sexton) on Jul 11, 2018 at 13:33 UTC | |
by corfuitl (Sexton) on Dec 18, 2018 at 12:19 UTC | |
by poj (Abbot) on Dec 18, 2018 at 16:42 UTC | |
by corfuitl (Sexton) on Dec 19, 2018 at 13:20 UTC | |
by poj (Abbot) on Dec 19, 2018 at 13:40 UTC | |
by corfuitl (Sexton) on Dec 19, 2018 at 18:18 UTC | |
by corfuitl (Sexton) on Dec 19, 2018 at 16:12 UTC | |
by poj (Abbot) on Dec 19, 2018 at 17:22 UTC |
Back to
Seekers of Perl Wisdom