Hello, grizzley.
I have little experience for huge XML files, so I take ready made 100MB xml sample file for example.
Does your colleague have free memory while his process? XML::Twig will eat up memory for large XML files without "purge" or "flush".
Bellow is my test script, counting text tag in two ways.
use strict; use warnings; use XML::Twig;
use Time::HiRes;
my $cnt1=0;
my $b1=Time::HiRes::time();
XML::Twig->new(
twig_roots => {
'text' => sub{ $cnt1++; $_[0]->purge;},
},
)->parsefile("standard");
my $e1=Time::HiRes::time();
my $cnt2=0;
my $b2=Time::HiRes::time();
XML::Twig->new(
twig_roots =>{
'/site/regions/africa//text' => sub{$cnt2++;},
},
)->parsefile("standard");
my $e2=Time::HiRes::time();
print "1. text count=$cnt1, time=".($e1-$b1)."\n";
print "2. text count=$cnt2, time=".($e2-$b2)."\n";
__DATA__
1. text count=105114, time=111.188741922379
2. text count=1657, time=60.9104990959167
When I forget to purge(), first example eated up my memory and coredumped. Sometimes, purge() needs some care because it purges inner most element ( XML Newbie 's example of Twig has some relation to it).
And if you can squeeze the range with xpath like expression, it could become faster.
I agree with other monks opinions ...
regards.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
|
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|