Beefy Boxes and Bandwidth Generously Provided by pair Networks Ovid
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Parsing huge XML file

by Gangabass (Priest)
on Sep 03, 2011 at 14:22 UTC ( #923992=perlquestion: print w/ replies, xml ) Need Help??
Gangabass has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have an error "Out of memory" while parsing large (100 Mb) XML file

use strict; use warnings; use XML::Twig; my $twig=XML::Twig->new(); my $data = XML::Twig->new->parsefile("divisionhouserooms-v3.xml")->sim +plify( keyattr => []); my @good_division_numbers = qw( 30 31 32 35 38 ); foreach my $property ( @{ $data->{DivisionHouseRoom}}) { my $house_code = $property->{HouseCode}; print $house_code, "\n"; my $amount_of_bedrooms = 0; foreach my $division ( @{ $property->{Divisions}->{Division} } ) { next unless grep { $_ eq $division->{DivisionNumber} } @good_d +ivision_numbers; $amount_of_bedrooms += $division->{DivisionQuantity}; } open my $fh, ">>", "Result.csv" or die $!; print $fh join("\t", $house_code, $amount_of_bedrooms), "\n"; close $fh; }

What i can do to fix this error issue?

Comment on Parsing huge XML file
Download Code
Re: Parsing huge XML file
by Anonymous Monk on Sep 03, 2011 at 14:43 UTC
Re: Parsing huge XML file
by zentara (Archbishop) on Sep 03, 2011 at 15:53 UTC

      Thank you for pointing me right direction. I have rewritten it:

      use strict; use warnings; use XML::Twig; my %bedrooms; my @bedrooms; my @good_division_numbers = qw( 30 31 32 35 38 ); my $xml = XML::Twig->new( twig_roots => { DivisionHouseRoom => \&count_bedroom +s, } ); $xml->parsefile( 'divisionhouserooms-v3.xml'); #$xml->parsefile('test.xml'); print "=" x 40, "\n"; open my $fh, ">>", "Result.csv" or die $!; foreach my $house_code (@bedrooms) { print $fh join( "\t", $house_code, $bedrooms{$house_code} ), "\n"; } close $fh; sleep 1; sub count_bedrooms { my ( $twig, $element ) = @_; my $house_code = $element->first_child_text('HouseCode'); print $house_code, "\n"; unless ( exists $bedrooms{$house_code} ) { push @bedrooms, $house_code; } my ($divisions) = $element->children('Divisions'); my @divisions = $divisions->children('Division'); for my $division (@divisions) { next unless grep { $_ eq $division->first_child_text('DivisionNum +ber') } @good_division_numbers; $bedrooms{$house_code} += $division->first_child_text('DivisionQuantity'); } $element->purge; }
Re: Parsing huge XML file
by afoken (Parson) on Sep 05, 2011 at 07:02 UTC

    Add more RAM to the machine. That never hurts.

    And try XML::LibXML instead of XML::Twig.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Parsing huge XML file
by Jenda (Abbot) on Sep 08, 2011 at 09:44 UTC

    You did not show us the XML so I took a guess:

    use strict; use XML::Rules; my %good_division_number = map( ($_ => 1), qw(30 31 32 35 38)); my $parser = XML::Rules->new( stripspaces => 7, rules => { _default => '', Division => sub { if ( $good_division_number{ $_[1]->{DivisionNumber} }) { return '+Bedrooms' => $_[1]->{DivisionQuantity}; } else { return; } }, Divisions => 'pass', DivisionHouseRoom => sub { print "$_[1]->{HouseCode}\t$_[1]->{Bedrooms}\n"; return; }, root => 'pass', } ); open my $fh, ">", "Result.csv" or die $!; my $old = select($fh); $parser->parse(\*DATA); close $fh; select $old; print "DONE\n"; __DATA__ <root> <DivisionHouseRoom HouseCode="1"> <Divisions> <Division DivisionNumber="15" DivisionQuantity="5" /> <Division DivisionNumber="30" DivisionQuantity="2" /> </Divisions> </DivisionHouseRoom> <DivisionHouseRoom HouseCode="2"> <Divisions> <Division DivisionNumber="30" DivisionQuantity="5" /> <Division DivisionNumber="35" DivisionQuantity="7" /> </Divisions> </DivisionHouseRoom> </root>

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://923992]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2014-04-20 04:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls