Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Processing Two XML Files in Parallel

by mirod (Canon)
on Jul 22, 2011 at 07:08 UTC ( #916069=note: print w/ replies, xml ) Need Help??


in reply to Processing Two XML Files in Parallel

One way to do this is to use XML::Twig and Coro: have one thread parse the first input file and an other one parse the other one. Pass control between the 2 threads, after each elem has been parsed:

#!/usr/bin/perl use strict; use warnings; use Coro; use XML::Twig; use Test::More; use Perl6::Slurp; use autodie qw(open); my $INPUT_A = "input_A.xml"; # input file A my $INPUT_B = "input_B.xml"; # input file B my $OUTPUT = "output.xml"; my $EXPECTED = "expected.xml"; # output file C open( my $out, '>', $OUTPUT); my $times; # global, maybe Coro has a better way to pass it around but + I don't know it my $t1= XML::Twig->new( twig_handlers => { elem => \&main_elem }, keep +_spaces => 1); my $t2= XML::Twig->new( twig_handlers => { elem => \&get_times }); # to get the numbers first, before the letters, t2 will be parsed in t +he main loop async { $t1->parsefile( $INPUT_A); }; $t2->parsefile( $INPUT_B); print {$out} "\n"; # missing \n for some reason $t1->flush( $out); print {$out} "\n"; # missing \n for some reason close $out; is( slurp( $OUTPUT), slurp( $EXPECTED), 'the one test'); done_testing(); sub main_elem { my( $t, $elem)= @_; $elem->set_text( $elem->text x $times); $t->flush( $out); cede; } sub get_times { my( $t, $elem)= @_; $times= $elem->text; $t->purge; cede; }

You will need to check that memory is indeed freed after each record. It should be OK, but I don't know exactly how Coro deals with memory, I had never used it before today.

Thank you for asking this and making me look into the problem. And to whoever mentioned Coro yesterday in the CB. This is something I had wanted to do for a long time, but I had always deferred it since I did not really need it for work. Overall it was pretty painless though, the Coro intro is quite well written.

update: also, I should have read Tanktalus answer, above, since he obviously knows Coro a lot better than I do. I am still happy I answered though, at least I learned something.


Comment on Re: Processing Two XML Files in Parallel
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://916069]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2015-07-03 22:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (57 votes), past polls