Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Multi-threads newbie questions

by daverave (Scribe)
on Sep 20, 2010 at 08:07 UTC ( #860786=perlquestion: print w/ replies, xml ) Need Help??
daverave has asked for the wisdom of the Perl Monks concerning the following question:

I am using Perl 5.10.1 under Ubuntu 10.04.

This is the outline of my task. I would like to write a subroutine ('my_sub') that gets a reference to a hash of hashes (hoh). A second subroutine ('helper') will work on a single 'inner' hash.

The work on each of the internal hashes is independent of the others, so I would like to use multi-threading to speed things up.

I'm using 'Thread::Pool::Simple' which almost lacks any documentation as far as I can tell. I also looked at 'Thread::Pool', but it seems some of its dependencies are not supported by my Perl version.

The key point that I have difficulties with is the fact that I would like 'helper' to update the (inner) hash it gets. For example, I would like 'helper' to add keys to the hash.

First, I tried writing 'helper' as a subroutine that gets a (inner) hashref, so it looks something like this:

sub helper { my $href = shift; $href->{NEW_KEY}=1; }
sub my_sub { $hohref = shift; # create thread pool my $pool = Thread::Pool::Simple->new( do => [ \&helper ] ); # submit jobs foreach my $hashref ( values %{$hohref} ) { $pool->add( $hashref ); } # wait for all threads to end $pool->join(); }
'my_sub' gets an unshared reference (to $hohref) so I tried creating a shared copy in the body of `my_sub`: my $shared_hohref = shared_clone $hohref; the use it and finally return it but the internal hashes were not updated. When I use the exact same code, but simply replace all the thread pool block with a simple loop (i.e. quit using multi-threading)
foreach my $hashref ( values %{$hohref} ) { helper( $hashref ); }
then everything works fine.

Your help would be greatly appreciated.

UPDATE

See this runnable example:

use strict; use warnings; use threads; use threads::shared; use Thread::Pool::Simple; use 5.010; use Data::Dumper; sub helper { say "helper starts"; my $href = shift; say "href is $href"; $href->{NEW_KEY} = 1; say "helper ends with $href"; } sub my_sub { my $hohref = shift; my $shared_hohref = shared_clone $hohref; my $pool = Thread::Pool::Simple->new( do => [\&helper] ); # submit jobs foreach my $hashref ( values %{$shared_hohref} ) { say "adding to pool: $hashref"; $pool->add($hashref); } # wait for all threads to end $pool->join(); return $shared_hohref; } my $hoh = { A => { NAME => "a" }, B => { NAME => "bb" } }; say "1\t", Dumper $hoh; my $updated_hoh = my_sub($hoh); say "2\t", Dumper $updated_hoh;
'helper starts' but that's it... what ever happens to it?

Comment on Multi-threads newbie questions
Select or Download Code
Re: Multi-threads newbie questions
by BrowserUk (Pope) on Sep 20, 2010 at 12:04 UTC

    The solution I'm afraid is "Don't use Thread::Pool or Thread::Pool::Simple". They're broken.

    Anything you enqueue (using the ->add() method) gets stringified using Storable, and so by the time your threads get something, it is a frozen/thawed copy of the original. Nothing they do to it will ever be reflected back to the original. They are just horribly, horribly broken.

    If you would care to describe your real application, I'd have a go a suggesting an approach to solving it.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Thank you BrowserUk. Perhaps I should use 'Thread::Queue`' then?

      In any case, I will describe my application in short, as you requested:

      I'm processing many genomes. Each genome is stored in a hash, which includes some basic data about the genome (organism, size etc.) and also many file location (genome sequence etc.). Each genome hash is what I previously referred to as an 'internal hash'. All those hashes are stored together in one big hash.

      The 'helper' sub, which we can now call 'process_genome', takes care of a single genome. It does some stuff, including calling external scripts which e.g. convert file formats, and add key-val pairs to the genome hash, e.g. new file locations.

      I would like to process all genomes. Since I have 8 cores on my server, I would like to use multi-threading. I would like to give as input a hash of (genome) hashes, and get back a similar structure, but updated.

      That's all, I think.

        Okay. Here's a very simple example (that works :), based on yours above:

        #! perl -slw use strict; use threads; use threads::shared; use Data::Dump qw[ pp ]; sub helper { my $ref = shift; ## Not needed if no more that one thread will access each subhash ## lock $ref; $ref->{NEW_KEY} = 1; } sub my_sub { my $ref = shift; my @threads = map async( \&helper, $_ ), values %{ $ref }; $_->join for @threads; } my $hoh = { A => shared_clone( { NAME => 'aa' } ), B => shared_clone( { NAME => 'bb' } ), }; pp $hoh; my_sub( $hoh ); pp $hoh;

        Output:


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        BTW. Always, the easiest, safest way to approach threading complex applications, is to write a single-threaded version that operates upon the data in a serial fashion.

        Once you have that working, if the data is truly independent, parallelising it is usually quite simple.

        For completeness, here is another example based on yours that uses a pool of threads. It is hardly more complicated than the first version:

        #! perl -slw use strict; use threads; use threads::shared; use Thread::Queue; use Data::Dump qw[ pp ]; sub helper { my $Q = shift; while( my $ref = $Q->dequeue ) {; lock $ref; $ref->{NEW_KEY} = 1; } } sub my_sub { my( $ref, $n ) = @_; my $Q = new Thread::Queue; my @threads = map async( \&helper, $Q ), 1 .. $n; $Q->enqueue( values %{ $ref } ); $Q->enqueue( (undef) x $n ); $_->join for @threads; } my $hoh = { A => shared_clone( { NAME => 'aa' } ), B => shared_clone( { NAME => 'bb' } ), }; pp $hoh; my_sub( $hoh, 2 ); pp $hoh;

        The output is identical to the earlier version.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      "it is a frozen/thawed copy of the original." ... and therefore the thread is free to use it any way it likes without having to synchronize access and without danger of stepping on another thread's toes.

      You've sent the thread some work to do, once its done it will send you the results.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

        and therefore the thread is free to use it any way it likes without having to synchronize access and without danger of stepping on another thread's toes.

        Did you look at the OPs code?

        His requirements mean that he has that ability--safe access without synchronisation. And without the need to duplicate through freeze/thaw (twice). That's part of the benefit of using threads.

        You've sent the thread some work to do,

        No need. It is a thread, It already has access. All you need to do it telll it what to access.

        Using threads like processes, is like buying a moped and pedalling it everywhere.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://860786]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (12)
As of 2014-12-18 13:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (51 votes), past polls