Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: XML::Twig and threads

by grizzley (Chaplain)
on Nov 28, 2012 at 10:04 UTC ( #1005997=note: print w/ replies, xml ) Need Help??


in reply to Re: XML::Twig and threads
in thread XML::Twig and threads [solved]

He does 'flush' and there are some xpath expressions. I've attached script and fake example XML to original node. It doesn't look like bad design and yet I hope something can be improved there.


Comment on Re^2: XML::Twig and threads
Replies are listed 'Best First'.
Re^3: XML::Twig and threads
by remiah (Hermit) on Nov 29, 2012 at 00:31 UTC

    I understand your situation at last.
    So, copying original and reuse it will be like this.

    my $t= XML::Twig->new(); $t->parsefile($inputFile); my $someData =$t->root->first_child; #someData for my $i(1 .. $loop) { for ( $someData->children_copy( 'managedObject') ){ handle_managedObject($t, $_); } print "Iteracja: $i / $loop \t-> OK\n"; $bID++; $someID = 0; }
    It becomes slower than original with my machine. Because deep coping Elt object takes too much time for large XML. I wonder is this same at your environment? Or maybe you already know this ...

    As BrowserUK says, use strict and warning please.

      I think the real problem is even worse in that with the real, 10MB & 100MB XML files, he is moving his process into swapping hence everything slows to a crawl.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        Hello, BrowserUK.

        I see. Example XML is just 613KB.

        Copying Twig object is terribly slow. I guess Data::Dumper or dclone of storable will not do any good, because it is just huge.

        for ( $someData->children_copy( 'managedObject') ){ handle_managedObject($t, $_); }
        Without copy, it is very fast.
        for ( $someData->children( 'managedObject') ){ handle_managedObject($t, $_); }
        So, I vaguely imagined rewriting managedObject sub using regex, for example ...
        my ($t, $element)=@_;
        
        # create rewrite rules using Twig 
        my %rewrite_rules =(
            q/name="name"/ => "some value",
        );
        
        
        #replace with regex
        my $buffer=$element->sprint; #get plain text of element
        for (keys %rewrite_rules){
            $buffer =~ s/ $_  (.*?)  >  .*?  (?=<)
                        /${_} ${1} $rewrite_rules{$_}/sx;
        }
        
        #just print out without changing $element
        print $fh $buffer;
        
        
        I will do like this, if I were.

        Regards and thanks for your response.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1005997]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (11)
As of 2015-07-29 07:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (261 votes), past polls