Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Re^2: XML::Twig and threads

by grizzley (Chaplain)
on Nov 28, 2012 at 10:04 UTC ( #1005997=note: print w/replies, xml ) Need Help??

in reply to Re: XML::Twig and threads
in thread XML::Twig and threads [solved]

He does 'flush' and there are some xpath expressions. I've attached script and fake example XML to original node. It doesn't look like bad design and yet I hope something can be improved there.

Replies are listed 'Best First'.
Re^3: XML::Twig and threads
by remiah (Hermit) on Nov 29, 2012 at 00:31 UTC

    I understand your situation at last.
    So, copying original and reuse it will be like this.

    my $t= XML::Twig->new(); $t->parsefile($inputFile); my $someData =$t->root->first_child; #someData for my $i(1 .. $loop) { for ( $someData->children_copy( 'managedObject') ){ handle_managedObject($t, $_); } print "Iteracja: $i / $loop \t-> OK\n"; $bID++; $someID = 0; }
    It becomes slower than original with my machine. Because deep coping Elt object takes too much time for large XML. I wonder is this same at your environment? Or maybe you already know this ...

    As BrowserUK says, use strict and warning please.

      I think the real problem is even worse in that with the real, 10MB & 100MB XML files, he is moving his process into swapping hence everything slows to a crawl.

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        Hello, BrowserUK.

        I see. Example XML is just 613KB.

        Copying Twig object is terribly slow. I guess Data::Dumper or dclone of storable will not do any good, because it is just huge.

        for ( $someData->children_copy( 'managedObject') ){ handle_managedObject($t, $_); }
        Without copy, it is very fast.
        for ( $someData->children( 'managedObject') ){ handle_managedObject($t, $_); }
        So, I vaguely imagined rewriting managedObject sub using regex, for example ...
        my ($t, $element)=@_;
        # create rewrite rules using Twig 
        my %rewrite_rules =(
            q/name="name"/ => "some value",
        #replace with regex
        my $buffer=$element->sprint; #get plain text of element
        for (keys %rewrite_rules){
            $buffer =~ s/ $_  (.*?)  >  .*?  (?=<)
                        /${_} ${1} $rewrite_rules{$_}/sx;
        #just print out without changing $element
        print $fh $buffer;
        I will do like this, if I were.

        Regards and thanks for your response.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1005997]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2018-06-22 00:06 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (120 votes). Check out past polls.