Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: XML::Twig and threads

by grizzley (Chaplain)
on Nov 28, 2012 at 10:04 UTC ( #1005997=note: print w/ replies, xml ) Need Help??


in reply to Re: XML::Twig and threads
in thread XML::Twig and threads [solved]

He does 'flush' and there are some xpath expressions. I've attached script and fake example XML to original node. It doesn't look like bad design and yet I hope something can be improved there.


Comment on Re^2: XML::Twig and threads
Re^3: XML::Twig and threads
by remiah (Hermit) on Nov 29, 2012 at 00:31 UTC

    I understand your situation at last.
    So, copying original and reuse it will be like this.

    my $t= XML::Twig->new(); $t->parsefile($inputFile); my $someData =$t->root->first_child; #someData for my $i(1 .. $loop) { for ( $someData->children_copy( 'managedObject') ){ handle_managedObject($t, $_); } print "Iteracja: $i / $loop \t-> OK\n"; $bID++; $someID = 0; }
    It becomes slower than original with my machine. Because deep coping Elt object takes too much time for large XML. I wonder is this same at your environment? Or maybe you already know this ...

    As BrowserUK says, use strict and warning please.

      I think the real problem is even worse in that with the real, 10MB & 100MB XML files, he is moving his process into swapping hence everything slows to a crawl.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        Hello, BrowserUK.

        I see. Example XML is just 613KB.

        Copying Twig object is terribly slow. I guess Data::Dumper or dclone of storable will not do any good, because it is just huge.

        for ( $someData->children_copy( 'managedObject') ){ handle_managedObject($t, $_); }
        Without copy, it is very fast.
        for ( $someData->children( 'managedObject') ){ handle_managedObject($t, $_); }
        So, I vaguely imagined rewriting managedObject sub using regex, for example ...
        my ($t, $element)=@_;
        
        # create rewrite rules using Twig 
        my %rewrite_rules =(
            q/name="name"/ => "some value",
        );
        
        
        #replace with regex
        my $buffer=$element->sprint; #get plain text of element
        for (keys %rewrite_rules){
            $buffer =~ s/ $_  (.*?)  >  .*?  (?=<)
                        /${_} ${1} $rewrite_rules{$_}/sx;
        }
        
        #just print out without changing $element
        print $fh $buffer;
        
        
        I will do like this, if I were.

        Regards and thanks for your response.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1005997]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2014-12-20 01:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (95 votes), past polls