Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Parsing a highly nested XML file correctly and efficiently

by Discipulus (Canon)
on Jun 08, 2016 at 07:10 UTC ( [id://1165131]=note: print w/replies, xml ) Need Help??


in reply to Parsing a highly nested XML file correctly and efficiently

I am currently using XML twig to parse this. I will be more than happy to try other options.

There are other (few) options, but not XML::Simple. Avoid it. I suspect that you unhappiness is due not to XML::Twig but to the shaggy beast XML is, per se.

Anyway i do not understand your expected output format:

d1|d2|Nest1->Nest2->d5->X|Nest1->Nest2->d5->Y|Nest1->Nest2->d6->X|Nest1->Nest2->d6->Y|Nest1->Nest2->Nest3->Nest4->d7->d9->d10->text.

what the above means?

Modifying a little your program (using strict too..)

use strict; #use this! use warnings; use XML::Twig; # use XML::Simple; NOOOO! # $localfile= "Test_1.xml"; used DATA instead my $field = "Nest1"; open my $fout1, '>', "testx.csv" or die "Could not open file!"; my $twig = XML::Twig->new( twig_roots => { $field => 1, 'd1' => 1, 'd2'=> 1, }, twig_handlers => { 'DatatoParse' => \&node, 'DatatoParse//*' => \&node1 } ); $/=''; ## added this $twig->parse(<DATA>); #modified this sub node { my($twig, $el) = @_; $twig->purge; } sub node1{ print $fout1 "\n", if ($_->tag eq "d1"); print $fout1 $_->text, ",", unless ($_->has_children('#EL +T')); print $fout1 "\n", if ($_->tag eq "elt"); } __DATA__ <DatatoParse> <elt> ....

I obtain some output that make some sense and no garbage at all:

#cat testx.csv TV show 1,Heroes,FULL,Page 65,-2,-3,5,8,yipppeee,yipppeee, -2,-3,5,8,yipppeee,yipppeee, -2,-3,5,8,yipppeee,yipppeee, -2,-3,5,8,yipppeee,yipppeee, TV show 2,Prison Break,FULL,Page 65,-2,-3,5,8,yipppeee,yipppeee, -2,-3,5,8,yipppeee,yipppeee, -2,-3,5,8,yipppeee,yipppeee, -2,-3,5,8,yipppeee,yipppeee, TV show 4,Alias,FULL,Page 65,-2,-3,5,8,yipppeee,yipppeee, -2,-3,5,8,yipppeee,yipppeee, -2,-3,5,8,yipppeee,yipppeee, -2,-3,5,8,yipppeee,yipppeee,

What is wrong with this? What output you want?

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^2: Parsing a highly nested XML file correctly and efficiently
by Ppeoc (Beadle) on Jun 09, 2016 at 16:14 UTC
    Thanks for your help. I hope I can make this a little more clear. 1) I want my output to be like this
    TV show 1,Heroes,FULL,Page 65,-2,-3,5,8,yipppeee (d1|d2|Nest1->Nest2-> +d5->X|Nest1->Nest2->d5->Y|Nest1->Nest2->d6->X|Nest1->Nest2->d6->Y|Nes +t1->Nest2->Nest3->Nest4->d7->d9->d10->text) TV show 1,Heroes,FULL,Page 65,-2,-3,5,8,yipppeee (d1|d2|Nest1->Nest2-> +d5->X|Nest1->Nest2->d5->Y|Nest1->Nest2->d6->X|Nest1->Nest2->d6->Y|Nes +t1->Nest2->Nest3->Nest4->d7->d9->d10->text) TV show 1,Heroes,FULL,Page 65,-2,-3,5,8,yipppeee(d1|d2|Nest1->Nest2->d +5->X|Nest1->Nest2->d5->Y|Nest1->Nest2->d6->X|Nest1->Nest2->d6->Y|Nest +1->Nest2->Nest3->Nest4->d7->d9->d10->text) TV show 1,Heroes,FULL,Page 65,-2,-3,5,8,yipppeee(d1|d2|Nest1->Nest2->d +5->X|Nest1->Nest2->d5->Y|Nest1->Nest2->d6->X|Nest1->Nest2->d6->Y|Nest +1->Nest2->Nest3->Nest4->d7->d9->d10->text) TV show 2,Prison Break,FULL,Page 65,-2,-3,5,8,yipppeee TV show 2,Prison Break,FULL,Page 65,-2,-3,5,8,yipppeee TV show 2,Prison Break,FULL,Page 65,-2,-3,5,8,yipppeee TV show 2,Prison Break,FULL,Page 65,-2,-3,5,8,yipppeee
    2) I do not want the data from parents labelled junk or notrequired. My current code displays those as well
      mmh for me it does not make it clearer
      TV show 1,Heroes,FULL,Page 65,-2,-3,5,8,yipppeee # does not match (at least for me..) with your description (if I under +stand it) (d1|d2|Nest1->Nest2->d5->X|Nest1->Nest2->d5->Y|Nest1->Nest2->d6->X|Nes +t1->Nest2->d6->Y|Nest1->Nest2->Nest3->Nest4->d7->d9->d10->text)

      L*

      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1165131]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-04-16 04:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found