@choroba: yes, still crashes. I tried adding a twig-> purge after the parse as you suggested. Also incorporated the STDERR code as suggested by @bliako to check if this was a case of a bad file that was causing the core dump.
The processing does seem to halt on the same file, but I wouldn't think this is a case of bad data, as this one file (which is about 60 MB) has mutliple XMLs within the file, and the point in the XML that the twig halts is no different is structure to the previous XML within the same file that got processed successfully. Let me know if this makes sense.
My guess is that some form of memory limit is being reached by the time the twig reaches this xml within the file, therefore resulting in the core dump. I would love to be proven wrong.
So @Choroba, is there a better way to write the code, like maybe a twig_roots, instead of the twig_handlers? Still getting to grips with XML::Twig, so would love to know what you think. Also I hear XML::Sax is greatly optimized for memory when it comes to XML parsing. Are there any other modules by which a better solution for this requirement could be arrived at?
Here's the code so far:
#!/usr/bin/perl
use warnings;
use strict;
use Text::CSV_XS;
use XML::Twig;
my $csv = 'Text::CSV_XS'->new({
sep_char => '|',
});
sub process_EDI_DC40 {
my ($twig, $thingy) = @_;
my @values = map {
my $ch = $thingy->first_child( $_ );
$ch ? $ch->text : ""
} qw( DOCNUM MESTYP SNDPRN RCVPOR RCVPRN );
unshift @values,'XML';
$csv->say (*STDOUT, \@values);
}
my $listfile = shift;
open my $list, '<', $listfile or die $!;
my $twig = 'XML::Twig'->new(
twig_handlers => {
EDI_DC40 => \&process_EDI_DC40,
},
);
my $fcount = 1;
while (my $xmlfile = <$list>) {
chomp $xmlfile;
print STDERR "$0 : about to process file # $fcount = '$xmlfile'\n";
$twig->parsefile($xmlfile);
print STDERR "$0 : file $xmlfile' processed OK.\n";
$fcount++;
$twig->purge;
}
| [reply] [d/l] |
I tried moving the $twig->purge within the subprocess (right after $csv->say (*STDOUT, \@values);), and this seems to have done the trick (although the script appears to parse slower than before). I will run a few other tests and post my update in a day. Thanks for all your assistance!
| [reply] |