Dear Monks,
please help me on the following issue:
given is the following xml-file (in fact this is an excerpt from the huge file but it shows the problem, imho):
<excerpt>
<unit>
<unitnumber>1</unitnumber>
<Name>Entity A</Name>
<Boss>Name</Boss>
<contactinfo>
<Address>
<Street>SomeStreet</Street>
<Building>1</Building>
<zip>00000</zip>
<Ort>Town</Ort>
</Address>
<Telefon>
<code>0123</code>
<telnumber>456</telnumber>
<directcall>78910</directcall>
</Telefon>
<Fax>
<code>0123</code>
<telnumber>456</telnumber>
<directcall>10987</directcall>
</Fax>
<email></email>
<URL></URL>
</contactinfo>
<products>
<article>
<art_code>A3236</art_code>
<quantity>554</quantity>
</article>
<article>
<art_code>B9735</art_code>
<quantity>386</quantity>
</article>
<article>
<art_code>C1299</art_code>
<quantity>322</quantity>
</article>
<article>
<art_code>D1918</art_code>
<quantity_small/>
</article>
<article>
<art_code>E0702</art_code>
<quantity_small/>
</article>
<article>
<art_code>F1290</art_code>
<quantity_small/>
</article>
</products>
</unit>
<unit>
<unitnumber>2</unitnumber>
<Name>Entity B</Name>
<Boss>Name</Boss>
<contactinfo>
<Address>
<Street>SomeOtherStreet</Street>
<Building>2</Building>
<zip>11111</zip>
<Ort>City</Ort>
</Address>
<Telefon>
<code>0999</code>
<telnumber>456</telnumber>
<directcall>78910</directcall>
</Telefon>
<Fax>
<code>0999</code>
<telnumber>456</telnumber>
<directcall>10987</directcall>
</Fax>
<email></email>
<URL></URL>
</contactinfo>
<products>
<article>
<art_code>A1136</art_code>
<quantity>1982</quantity>
</article>
<article>
<art_code>B0765</art_code>
<quantity>988</quantity>
</article>
<article>
<art_code>C8099</art_code>
<quantity>522</quantity>
</article>
<article>
<art_code>D3938</art_code>
<quantity_small/>
</article>
<article>
<art_code>E5722</art_code>
<quantity_small/>
</article>
<article>
<art_code>F3596</art_code>
<quantity_small/>
</article>
</products>
</unit>
</excerpt>
I need (among other things) to get the list of paired values "art_code" and "quantity" per unit - e.g. in the following form:
Entity A;A3236;554
Entity A;B9735;386
Entity A;C1299;322
...
Entity B;A1136;1982
etc.
Since I am a novice in Perl I could only make the following crook so far, the output could be after all proceeded with a regex (flush).
use strict;
use warnings;
use XML::LibXML;
my $filename = "Test.xml";
my $my_object = XML::LibXML->new();
my $treeobjekt = $my_object->parse_file($filename);
my $root = $treeobjekt->getDocumentElement;
my @units=$treeobjekt->findnodes("//excerpt/unit");
for(my $i=0;$i<@units;$i++) {
my $unitname=$units[$i]->findvalue('./Name/text()');
my $art = $units[$i]->findvalue('./products/article');
my $art_chain = join('---', split(/\n/, $art));
print "$unitname;$art_chain\n";
}
I have an additional problem here too. As you see some positions have exact numbers at <quantity> and some others have a tag <quantity_small/>.
I would like to get only the positions where there are exact numbers. I tried to modify the above script in the following way:
my $art_chain;
if($units[$i]->findvalue('./products/article/quantity')>0) {
$art_chain = join('---', split(/\n/, $art));
}
but it seems to have no effect on the output.
How could I get the paired values in a better way and filter the inexact quantity informations out?
Thank you in advance for your help!
VE
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.