Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^2: XML::Twig and threads

by grizzley (Chaplain)
on Nov 28, 2012 at 10:04 UTC ( #1005997=note: print w/replies, xml ) Need Help??


in reply to Re: XML::Twig and threads
in thread XML::Twig and threads [solved]

He does 'flush' and there are some xpath expressions. I've attached script and fake example XML to original node. It doesn't look like bad design and yet I hope something can be improved there.

Replies are listed 'Best First'.
Re^3: XML::Twig and threads
by remiah (Hermit) on Nov 29, 2012 at 00:31 UTC

    I understand your situation at last.
    So, copying original and reuse it will be like this.

    my $t= XML::Twig->new(); $t->parsefile($inputFile); my $someData =$t->root->first_child; #someData for my $i(1 .. $loop) { for ( $someData->children_copy( 'managedObject') ){ handle_managedObject($t, $_); } print "Iteracja: $i / $loop \t-> OK\n"; $bID++; $someID = 0; }
    It becomes slower than original with my machine. Because deep coping Elt object takes too much time for large XML. I wonder is this same at your environment? Or maybe you already know this ...

    As BrowserUK says, use strict and warning please.

      I think the real problem is even worse in that with the real, 10MB & 100MB XML files, he is moving his process into swapping hence everything slows to a crawl.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        Hello, BrowserUK.

        I see. Example XML is just 613KB.

        Copying Twig object is terribly slow. I guess Data::Dumper or dclone of storable will not do any good, because it is just huge.

        for ( $someData->children_copy( 'managedObject') ){ handle_managedObject($t, $_); }
        Without copy, it is very fast.
        for ( $someData->children( 'managedObject') ){ handle_managedObject($t, $_); }
        So, I vaguely imagined rewriting managedObject sub using regex, for example ...
        my ($t, $element)=@_;
        
        # create rewrite rules using Twig 
        my %rewrite_rules =(
            q/name="name"/ => "some value",
        );
        
        
        #replace with regex
        my $buffer=$element->sprint; #get plain text of element
        for (keys %rewrite_rules){
            $buffer =~ s/ $_  (.*?)  >  .*?  (?=<)
                        /${_} ${1} $rewrite_rules{$_}/sx;
        }
        
        #just print out without changing $element
        print $fh $buffer;
        
        
        I will do like this, if I were.

        Regards and thanks for your response.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1005997]
help
Chatterbox?
[Corion]: Hi Lady_Aleena!
[Lady_Aleena]: marioroy, are you also writing a Meditation for this?
choroba o/
Discipulus adds tozzetti alle mandorle to the platter on the sideboard.
Lady_Aleena was checking #cbstream and saw the big announcement.
[marioroy]: Tie::IxHash can be shared. Also similar modules. I cannot wait to post a solution sharing Tie::File and iterating among workers.
[marioroy]: Lady_Aleena Yes, will post something with all the new features. Parallel with few lines of code.
[marioroy]: Yes, will do for PM. I love PM.
Lady_Aleena ponders a meditation of her own, but it would be on a far far less important topic.
[marioroy]: I'm hoping to have a release in about a week's time.

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2017-05-26 08:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?