Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Segmentation coredump issue

by choroba (Cardinal)
on May 04, 2018 at 14:51 UTC ( [id://1214063]=note: print w/replies, xml ) Need Help??


in reply to Segmentation coredump issue

Does it still crash if you call
$twig->purge;
after having parsed the file?
($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

Replies are listed 'Best First'.
Re^2: Segmentation coredump issue
by Kal87 (Novice) on May 08, 2018 at 13:31 UTC
    @choroba: yes, still crashes. I tried adding a twig-> purge after the parse as you suggested. Also incorporated the STDERR code as suggested by @bliako to check if this was a case of a bad file that was causing the core dump.

    The processing does seem to halt on the same file, but I wouldn't think this is a case of bad data, as this one file (which is about 60 MB) has mutliple XMLs within the file, and the point in the XML that the twig halts is no different is structure to the previous XML within the same file that got processed successfully. Let me know if this makes sense.

    My guess is that some form of memory limit is being reached by the time the twig reaches this xml within the file, therefore resulting in the core dump. I would love to be proven wrong.

    So @Choroba, is there a better way to write the code, like maybe a twig_roots, instead of the twig_handlers? Still getting to grips with XML::Twig, so would love to know what you think. Also I hear XML::Sax is greatly optimized for memory when it comes to XML parsing. Are there any other modules by which a better solution for this requirement could be arrived at? Here's the code so far:
    #!/usr/bin/perl use warnings; use strict; use Text::CSV_XS; use XML::Twig; my $csv = 'Text::CSV_XS'->new({ sep_char => '|', }); sub process_EDI_DC40 { my ($twig, $thingy) = @_; my @values = map { my $ch = $thingy->first_child( $_ ); $ch ? $ch->text : "" } qw( DOCNUM MESTYP SNDPRN RCVPOR RCVPRN ); unshift @values,'XML'; $csv->say (*STDOUT, \@values); } my $listfile = shift; open my $list, '<', $listfile or die $!; my $twig = 'XML::Twig'->new( twig_handlers => { EDI_DC40 => \&process_EDI_DC40, }, ); my $fcount = 1; while (my $xmlfile = <$list>) { chomp $xmlfile; print STDERR "$0 : about to process file # $fcount = '$xmlfile'\n"; $twig->parsefile($xmlfile); print STDERR "$0 : file $xmlfile' processed OK.\n"; $fcount++; $twig->purge; }
      I tried moving the $twig->purge within the subprocess (right after $csv->say (*STDOUT, \@values);), and this seems to have done the trick (although the script appears to parse slower than before).
      I will run a few other tests and post my update in a day. Thanks for all your assistance!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1214063]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-04-25 15:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found