Re: XML::Twig and threads

in reply to XML::Twig and threads [solved]

Hello, grizzley.

I have little experience for huge XML files, so I take ready made 100MB xml sample file for example.

Does your colleague have free memory while his process? XML::Twig will eat up memory for large XML files without "purge" or "flush".

Bellow is my test script, counting text tag in two ways.

use strict; use warnings; use XML::Twig;
use Time::HiRes;

my $cnt1=0;
my $b1=Time::HiRes::time();
XML::Twig->new(
    twig_roots => {
        'text' => sub{ $cnt1++; $_[0]->purge;}, 
    },
)->parsefile("standard");
my $e1=Time::HiRes::time();

my $cnt2=0;
my $b2=Time::HiRes::time();
XML::Twig->new(
    twig_roots =>{
        '/site/regions/africa//text' => sub{$cnt2++;},
    },
)->parsefile("standard");
my $e2=Time::HiRes::time();

print "1. text count=$cnt1, time=".($e1-$b1)."\n";
print "2. text count=$cnt2, time=".($e2-$b2)."\n";



__DATA__
1. text count=105114, time=111.188741922379
2. text count=1657, time=60.9104990959167
[download]

When I forget to purge(), first example eated up my memory and coredumped. Sometimes, purge() needs some care because it purges inner most element (XML Newbie 's example of Twig has some relation to it).

And if you can squeeze the range with xpath like expression, it could become faster.

I agree with other monks opinions ...
regards.

Comment on Re: XML::Twig and threads Download Code

Replies are listed 'Best First'.
Re^2: XML::Twig and threads by grizzley (Chaplain) on Nov 28, 2012 at 10:04 UTC
He does 'flush' and there are some xpath expressions. I've attached script and fake example XML to original node. It doesn't look like bad design and yet I hope something can be improved there.	[reply]
Re^3: XML::Twig and threads by remiah (Hermit) on Nov 29, 2012 at 00:31 UTC
I understand your situation at last. So, copying original and reuse it will be like this. `my $t= XML::Twig->new(); $t->parsefile($inputFile); my $someData =$t->root->first_child; #someData for my $i(1 .. $loop) { for ( $someData->children_copy( 'managedObject') ){ handle_managedObject($t, $_); } print "Iteracja: $i / $loop \t-> OK\n"; $bID++; $someID = 0; }` [download] It becomes slower than original with my machine. Because deep coping Elt object takes too much time for large XML. I wonder is this same at your environment? Or maybe you already know this ... As BrowserUK says, use strict and warning please.	[reply] [d/l]
Re^4: XML::Twig and threads by BrowserUk (Patriarch) on Nov 29, 2012 at 00:36 UTC
I think the real problem is even worse in that with the real, 10MB & 100MB XML files, he is moving his process into swapping hence everything slows to a crawl. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP Neil Armstrong	[reply]
Re^5: XML::Twig and threads by remiah (Hermit) on Nov 29, 2012 at 02:18 UTC
Re^6: XML::Twig and threads by BrowserUk (Patriarch) on Nov 29, 2012 at 02:49 UTC
Some notes below your chosen depth have not been shown here

In Section Seekers of Perl Wisdom