Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Easy XML-parser that can handle large file?

by Discipulus (Abbot)
on Sep 11, 2014 at 08:20 UTC ( #1100265=note: print w/replies, xml ) Need Help??


in reply to Re: Easy XML-parser that can handle large file?
in thread Easy XML-parser that can handle large file?

.. i'm so slow in responding...
This is my best with your data (surely can be improved): UPDATE: the code was broken, updated...
my $t= XML::Twig->new( pretty_print => 'indented', twig_handlers => { 'product'=>sub{ my @pname = $_[1]->get_x +path('name'); my @pids = $_[1]->get_xp +ath('product_id'); print $pids[0]->text," - + ",$pname[0]->text,"\n"; my %h; my @ids = $_[1]->get_xpa +th('attributes/attribute/group/id'); my @names = $_[1]->get_x +path('attributes/attribute/group/name'); @h{map {$_->text} @ids } + = map {$_->text} @names ; my @vids = $_[1]->get_xp +ath('attributes/attribute/value/id'); my @values = $_[1]->get_ +xpath('attributes/attribute/value/value'); @h{map {$_->text} @vids +} = map {$_->text} @values ; print map {"\t$_ - $h{$_ +}\n"} keys %h; print "\n\n"; } } ); $t->parse($xml); ####OUTPUT ABC123 - My product - 12.1998 1561 - Lġngd (i mm) 1507 - Engines 1498 - Year model 12033 - Vehicle equipment 12019 - Maybe 1518 - Year model (to) 301 - Generator XYZ789 - My product - 12.1992 1507 - Engines 1498 - Year model 1518 - Year model (to) 301 - Generator
HtH
L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^3: Easy XML-parser that can handle large file?
by mirod (Canon) on Sep 12, 2014 at 05:37 UTC

    Nice.

    In case you, or the reader, don't know: in handlers $_ is aliased to $_[1], so you can write ,$_->get_xpath(...) instead of $_[1]->get_xpath(...). Beyond saving 3 characters each time, I am used to $_ meaning "the current element" within a handler, and I find it easier to read.

      a 'nice' from the module author... i'm honored... ;=)

      I never noticed this feature of $_ set to $_[1] (or well, i used incosciously..)

      May be worth to add some line in the Synopsis:
      para => sub { $_[1]->set_tag( 'p') }, # change para to p (handlers + receive $twig and $element as argouments) para => sub { $_->set_tag( 'p') }, # change para to p ($_ is al +iased to $_[1] for convenience ) ###and in the corpus of the docs: $_ is also set to the element (ie: $_[1]), so it is easy to write inli +ne handlers like


      L*
      There are no rules, there are no thumbs..
      Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

        The first example in the synopsis uses $_, I added a comment to make it more explicit. It's also mentioned in the XML::Twig 101 section.

        I know that the docs are a bit overwhelming and hard to read.

        The main reason is probably that XML::Twig it's too big. This comes from a decision I made, a long time ago, to have just 1 massive module, instead of several ones. The reason at the time was to allow users to install the module easily, especially on Windows, by simply copying Twig.pm in the proper place. At the time there were no Strawberry Perl, perlbrew or local::lib... If I were to start again I would certainly break up the module in several sub-modules (as with XML::Twig::XPath).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1100265]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (9)
As of 2019-10-23 21:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?