Re^2: How to read compressed (gz) file in xml::twig

Thanks Hauke for the solution! That works but script takes a lot time in providing the output. Can this be reduced? Gzipped file is of 45MB. Total children (Offer) are 158K.

And, second question wasn't related to the first; that was different. Here is the code snippet.

my $file = 'Offerfeed_11742413_uk.full.xml.gz';
my $z = IO::Uncompress::Gunzip->new($file) or die "gunzip failed: $IO:
+:Uncompress::Gunzip::GunzipError\n";

my $twig = new XML::Twig;    ## Get twig object
$twig->parse($z);    ## parse the file to build twig


my $root = $twig->root;        ## Get the root element of twig
my @elements = $root->children;    ## Get elements list of twig

my $ct = 0;
foreach my $e (sort @elements){
    my $cpc = ($e->first_child('EstimatedCPC')->text)*100;
    print $cpc,"\n";
    $ct++;
}
print $ct,"\n";
[download]

Comment on Re^2: How to read compressed (gz) file in xml::twig Download Code

Replies are listed 'Best First'.
Re^3: How to read compressed (gz) file in xml::twig by marto (Cardinal) on Feb 20, 2017 at 12:11 UTC
45MB unzipped is going to be a lot of data, take a look at some of these file sizes, XML vs gzip. Either profile your code to see if improvements can be made (see the documentation for advice on huge documents), or invest in faster CPU, disks, much more RAM...	[reply]
Re^3: How to read compressed (gz) file in xml::twig by haukex (Archbishop) on Feb 20, 2017 at 16:07 UTC
Hi CSharma, but script takes a lot time in providing the output How long does it take to gunzip the file and then process it with your existing script? How much longer does the above code take? To get a somewhat decent comparison, try piping the output of gunzip into your script (it'll need a slight modification to read from `STDIN`). One thing that might* speed things up is if you make use of XML::Twig's ability to parse an XML file in chunks, instead of reading the whole thing into memory like you're currently doing. `use warnings; use strict; use IO::Uncompress::Gunzip (); use XML::Twig; my $z = IO::Uncompress::Gunzip->new('in.xml.gz') or die "gunzip failed: $IO::Uncompress::Gunzip::GunzipError\n"; my $twig = XML::Twig->new( twig_roots => { '/CatalogListings/Offer/EstimatedCPC' => sub { my ($t, $elt) = @_; print $elt->text100, "\n"; $t->purge; }, }, ); $twig->parse($z); $z->close;` [download] This produces the same output as before, but discards each `<EstimatedCPC>` element when it's done processing it, and ignores the other elements. ( The code works, but I haven't had the chance to do a performance test.) Hope this helps, -- Hauke D	[reply] [d/l] [select]
Re^4: How to read compressed (gz) file in xml::twig by CSharma (Sexton) on Feb 22, 2017 at 03:08 UTC
Thanks a lot Hauke!! The code worked and it's certainly better than earlier. Thanks, Chetan	[reply]


Perl: the Markov chain saw
	PerlMonks