Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Is it possible to create a single Hash-of-Hash.. with multiple threads..

by gsat (Initiate)
on Mar 13, 2014 at 23:22 UTC ( #1078249=perlquestion: print w/ replies, xml ) Need Help??
gsat has asked for the wisdom of the Perl Monks concerning the following question:

Dear Enlightened Monks..

I have to create & populate a new large hohoh hash-of-hash structure, cooked out of another input hoh structure, which I am reading from a file by Storable retrieve functionality.

I am using Thread::Queue

The problem I am facing ::

Issue.1 : I thought with shared_clone I can create a deep copy of entire hoh - Using threads [Version 1.92] & threads::shared [Version 1.46].

my $indb = retrieve('input_hash.perl.db'); my $copydb = shared_clone( $indb ); .. .. my $thr1 = threads->new(\&thrd_run, $copydb); ..

not working

==========

For the time being as input db is not too large - I am repeatedly retrieving from inside every thread-run [ sub thrd_run ] - i don't like this - I want to retrieve the db only once - and want to reuse / use cloned copy / shared copy of the same and pass it to every thread. Is is possible?

Issue.2 : I wish to create another hohoh [a tree - with approx leaf node numbers > 60M - after few different depth of branched-hierarchies ] - I wish to populate this hohoh structure from inside my thread runs - shared write into a hohoh ref.. is that possible?

Even if no, what would be a good alternative plans.

Thanks a bunch in advance..

Any light will help in the dark..

Comment on Is it possible to create a single Hash-of-Hash.. with multiple threads..
Select or Download Code
Re: Is it possible to create a single Hash-of-Hash.. with multiple threads..
by kcott (Abbot) on Mar 14, 2014 at 00:17 UTC

    G'day gsat,

    Welcome to the monastery.

    Is this the sort of thing you were after:

    #!/usr/bin/env perl use strict; use warnings; use autodie; use threads; use threads::shared; use Storable; my $storable_file = './pm_1078249_stored_hash'; { my $hoh_to_store = { A => { B => 2, C => 3 }, D => { E => 5, F => +6 } }; store $hoh_to_store => $storable_file; } my @threads; my %hohoh :shared; for (1 .. 3) { push @threads, threads->create(sub { $hohoh{threads->tid} = shared_clone(retrieve $storable_file); }); } $_->join for @threads; # Test result use Data::Dumper; print Dumper \%hohoh; # My housekeeping unlink $storable_file;

    Output:

    $VAR1 = { '1' => { 'D' => { 'E' => 5, 'F' => 6 }, 'A' => { 'B' => 2, 'C' => 3 } }, '2' => { 'D' => { 'E' => 5, 'F' => 6 }, 'A' => { 'B' => 2, 'C' => 3 } }, '3' => { 'A' => { 'B' => 2, 'C' => 3 }, 'D' => { 'E' => 5, 'F' => 6 } } };

    [You appear to have put square brackets around several pieces of text in your OP. This generates links here. See "Writeup Formatting Tips" and "What shortcuts can I use for linking to other information?" for details of how to fix your post.]

    -- Ken

      Thanks much for your time. Your code is simple & works fine - I guess because it is simple 2 level deep & the subroutine is not increasing the depth while working.

      The data structure - I am handling with, will increase the structure depth. Basically branch-out, like a growing tree, every thread working on a unique branch is OK.

      As other suggestions came like - Storing output in a new structure - and finally merging it - is fine - how to return different sized depth of every branch structure varying between 9 to 14, number of branches would few tens of thousands - and join them to form the final tree.

      The starting structure that I need to share across threads - will be having depths like 4-6 level, when threads finished working every branch will have 9-14 levels.

      BTW, in my experience, the %hash kind of usage are not very convenient, rather - I prefer reference of hash-ref-of-hash-ref - right from the root. I used to like..

      $root->{$branch1}->{$branch2}..

      Overall - so far in my understanding - perl threads is not cool, no matter what - It really cannot do big job with complex data structure in shared way - absolutely no references shared between threads. Internally every thread is creating another copy, so with GBs of data-structure I cannot really launch many threads.

      My single-thread serialized code is working great though - it can do serious work on massive data-structures. It lacks true parallelism..

      Perl Threads implementation so far, seems very inadequate - it is good for primary school student exhibiting how parallel programming works.

      I wish next gen threads will be able to do what I am asking - seamless work on the shared structure over GPUs too [why not - being Interpreted, won't stop it running in GPU I guess. :) - anyway Thanks a lot again!!

        Perl Threads implementation so far, seems very inadequate - it is good for primary school student exhibiting how parallel programming works.

        That is a wrong conclusion drawn through a lack of understanding.

        If you would care to describe your application in detail -- not just generics, but rather scales and reasons; with examples of data and processing requirements -- then you would (probably) get help with finding definitive, practical and efficient solutions to those requirements.

        Your OP asks a simplistic question with an obvious answer: Yes, of course a multi-level hash can be constructed using multiple threads. It is simple and trivial:

        #! perl -slw use strict; use threads; use threads::shared; sub displayHash { my( $ref, $pad ) = @_; return "\n" unless keys %{ $ref }; my $buf = ''; for my $key ( sort keys %{ $ref } ) { $buf .= "$pad" if length $buf; $buf .= "{$key}" . displayHash( $ref->{ $key }, $pad . ' ' ) +; } return $buf; } sub worker { my( $ref, $reps ) = @_; for( 1 .. $reps ) { my $copyRef = $ref; for my $step ( map chr( 97 + rand 26 ), 1 .. int( rand 10 ) ) +{ if( exists $copyRef->{ $step } ) { $copyRef = $copyRef->{ $step }; } else { lock %{ $copyRef }; $copyRef = $copyRef->{ $step } = &share( {} ); } } } } our $REPS //= 20; our $THREADS //= 4; my %HoHos : shared; my @threads = map threads->create( \&worker, \%HoHos, $REPS ), 1 .. $T +HREADS; $_->join for @threads; print displayHash( \%HoHos, '' ); __END__ C:\test>junk91 -REPS=7 {b}{l}{w}{g} {d}{t}{k}{f}{g}{g}{g}{i}{j} {f}{j}{v} {g}{s}{g}{h}{g} {i}{f}{j}{r}{m}{v}{u}{v}{b} {k}{d}{f}{q}{k} {z}{t}{m}{h}{e}{i} {m}{b}{l} {l}{u}{p}{t}{d}{h} {q}{k}{y} {n}{f}{u}{v}{z}{z}{l} {o}{f} {p}{u}{s}{w}{n}{t} {q}{h} {y}{k}{d}{l}{a}{n}{a}{i} {r}{q}{t}{g} {v}{o}{n}{x}{b}{g}{c} {w}{a} {y}{a} {d}{y}{j} {z}{a}{p}{t}{i}{r}{t} {y}{i}{g}{c}{v}{z}{z}{k}

        But, it is also probably not very helpful to your real application.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Is it possible to create a single Hash-of-Hash.. with multiple threads..
by zentara (Archbishop) on Mar 14, 2014 at 10:54 UTC
    I havn't been keeping up on my threads docs lately, but I seem to remember that shared hashes between threads required some sort of special handling and locking, to be assurred that the hash won't get corrupted. The example being, what if one thread is removing a primary key from the hash, while another may be trying to write a secondary key to that primary key's descendants.

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re: Is it possible to create a single Hash-of-Hash.. with multiple threads..
by oiskuu (Friar) on Mar 14, 2014 at 14:36 UTC

    You need not share the input db (unless it's later modified). Just read it before starting the threads and they all inherit a copy.

    In general, having multiple threads work on same data is a recipe for dreadful performance problems. Data lines will constantly migrate between L2 caches; this is quite expensive.

    Instead, have each subthread construct a part of the tree, returning a shared_clone. Main thread will build the toplevel hash.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1078249]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2014-12-27 10:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls