Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Parsing huge XML file

by Gangabass (Priest)
on Sep 03, 2011 at 14:22 UTC ( #923992=perlquestion: print w/ replies, xml ) Need Help??
Gangabass has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have an error "Out of memory" while parsing large (100 Mb) XML file

use strict; use warnings; use XML::Twig; my $twig=XML::Twig->new(); my $data = XML::Twig->new->parsefile("divisionhouserooms-v3.xml")->sim +plify( keyattr => []); my @good_division_numbers = qw( 30 31 32 35 38 ); foreach my $property ( @{ $data->{DivisionHouseRoom}}) { my $house_code = $property->{HouseCode}; print $house_code, "\n"; my $amount_of_bedrooms = 0; foreach my $division ( @{ $property->{Divisions}->{Division} } ) { next unless grep { $_ eq $division->{DivisionNumber} } @good_d +ivision_numbers; $amount_of_bedrooms += $division->{DivisionQuantity}; } open my $fh, ">>", "Result.csv" or die $!; print $fh join("\t", $house_code, $amount_of_bedrooms), "\n"; close $fh; }

What i can do to fix this error issue?

Comment on Parsing huge XML file
Download Code
Re: Parsing huge XML file
by Anonymous Monk on Sep 03, 2011 at 14:43 UTC
Re: Parsing huge XML file
by zentara (Archbishop) on Sep 03, 2011 at 15:53 UTC

      Thank you for pointing me right direction. I have rewritten it:

      use strict; use warnings; use XML::Twig; my %bedrooms; my @bedrooms; my @good_division_numbers = qw( 30 31 32 35 38 ); my $xml = XML::Twig->new( twig_roots => { DivisionHouseRoom => \&count_bedroom +s, } ); $xml->parsefile( 'divisionhouserooms-v3.xml'); #$xml->parsefile('test.xml'); print "=" x 40, "\n"; open my $fh, ">>", "Result.csv" or die $!; foreach my $house_code (@bedrooms) { print $fh join( "\t", $house_code, $bedrooms{$house_code} ), "\n"; } close $fh; sleep 1; sub count_bedrooms { my ( $twig, $element ) = @_; my $house_code = $element->first_child_text('HouseCode'); print $house_code, "\n"; unless ( exists $bedrooms{$house_code} ) { push @bedrooms, $house_code; } my ($divisions) = $element->children('Divisions'); my @divisions = $divisions->children('Division'); for my $division (@divisions) { next unless grep { $_ eq $division->first_child_text('DivisionNum +ber') } @good_division_numbers; $bedrooms{$house_code} += $division->first_child_text('DivisionQuantity'); } $element->purge; }
Re: Parsing huge XML file
by afoken (Parson) on Sep 05, 2011 at 07:02 UTC

    Add more RAM to the machine. That never hurts.

    And try XML::LibXML instead of XML::Twig.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: Parsing huge XML file
by Jenda (Abbot) on Sep 08, 2011 at 09:44 UTC

    You did not show us the XML so I took a guess:

    use strict; use XML::Rules; my %good_division_number = map( ($_ => 1), qw(30 31 32 35 38)); my $parser = XML::Rules->new( stripspaces => 7, rules => { _default => '', Division => sub { if ( $good_division_number{ $_[1]->{DivisionNumber} }) { return '+Bedrooms' => $_[1]->{DivisionQuantity}; } else { return; } }, Divisions => 'pass', DivisionHouseRoom => sub { print "$_[1]->{HouseCode}\t$_[1]->{Bedrooms}\n"; return; }, root => 'pass', } ); open my $fh, ">", "Result.csv" or die $!; my $old = select($fh); $parser->parse(\*DATA); close $fh; select $old; print "DONE\n"; __DATA__ <root> <DivisionHouseRoom HouseCode="1"> <Divisions> <Division DivisionNumber="15" DivisionQuantity="5" /> <Division DivisionNumber="30" DivisionQuantity="2" /> </Divisions> </DivisionHouseRoom> <DivisionHouseRoom HouseCode="2"> <Divisions> <Division DivisionNumber="30" DivisionQuantity="5" /> <Division DivisionNumber="35" DivisionQuantity="7" /> </Divisions> </DivisionHouseRoom> </root>

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://923992]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (13)
As of 2014-09-19 11:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (135 votes), past polls