http://www.perlmonks.org?node_id=1005701


in reply to Re: XML::Twig and threads
in thread XML::Twig and threads [solved]

I did a simple test:
use XML::Twig; use threads; $start = time; $t= XML::Twig->new(twig_roots => {managedObject => \&handle_fasade}); $t->parsefile('inputFiles/input100MB.xml'); print "Time: ", time-$start; sub handle_fasade{ }
and the output was:
# Time: 149s, 3.5GB RAM # Script quits after 71s
So you are right - 2 minutes is not much time. What worries me is 3.5GB RAM, because of further clarification in Re^2: XML::Twig and threads.

Replies are listed 'Best First'.
Re^3: XML::Twig and threads
by BrowserUk (Patriarch) on Nov 26, 2012 at 16:11 UTC
    So you are right - 2 minutes is not much time. What worries me is 3.5GB RAM

    I'll bet £1 to 1p that if you comment out the use threads;, the memory consumption will barely change.

    It is not at all uncommon for a 100MB XML file to translate into 3.6GB of ram requirement once it has been parsed and the equivalent data structure constructed.

    The memory requirement has nothing to do with threading. Just Perl's well-known tendency to trade memory for cpu.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

    blockquote

      I won't bet obviousness :)

      I've updated question node with example XML and script doing the job. Input file is read again in each of few hundred iterations. I was thinking about reading it once and then make in-memory copy in each iteration, but this will include swap on HDD and definitely won't speed up the code. Another way would be to store all changes write altered version to file, undo stored changes etc. That would save copying, but on the other hand as there are many changes - structure storing changes might be of same size as original. The best would be to somehow write to file original structure line after line and replace fragments on-fly.

        Firstly, updating the root node with new code is a bad idea. Very few if any people will re-read old nodes, so it will never be seen except by those you notify.

        Second, posting code that doesn't use strict & warnings and contains trivial errors that those would catch:

        Use of uninitialized value $fields[2] in pattern match (m//) at C:\tes +t\1005623.pl line 56. Use of uninitialized value $fields[3] in pattern match (m//) at C:\tes +t\1005623.pl line 64.

        will stop most people (including me) from looking further.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong