Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

updated_again: how to loop through hash tables more efficiently

by lrl1997 (Novice)
on Sep 17, 2012 at 20:30 UTC ( #994096=perlquestion: print w/ replies, xml ) Need Help??
lrl1997 has asked for the wisdom of the Perl Monks concerning the following question:

Dear all, I just updated my question below, so is the format of both hashes.

I feel the code I provided below is not fast enough, I would really need some help to improve it's efficiency.

I have two hash tables,

hash1: (keys and values separated by tab)

k_a1 k_b1,k_b2,k_b3

k_a2 k_b4,K_b5

hash2:(keys and values separated by tab)

k_b1 val_a1

k_b2 val_a2

k_b3 val_a3

k_b4 val_a4

k_b5 val_a5

I want to loop through both hash tables and foreach keys in hash2, if the value can be found in the value of hash1, then print out the corresponding keys from both hash1 and hash2 (keys would be keys from hash2, values would be values for hash2 and keys of hash1, all separated by tab): e.g.

k_b1 val_a1 k_a1

k_b2 val_a2 k_a1

k_b3 val_a3 k_a1

k_b4 val_a4 k_a2

k_b5 val_a5 k_a2

This is the code

use strict; sub read_hash { my $fname = shift; open (my $fh, "<",$fname) or die "$!"; my %hash=(); while (<$fh>) { my $line=$_; chomp $line; my ($key,$value)=split /\t/, $line, 2; $hash{$key}=$value; } return \%hash; } my $hash1=shift; my $hash2=shift; my $h1=read_hash("$hash1"); my $h2=read_hash("$hash2"); my @fields; while (my ($k1, $v1) = each %$h1) { @fields=split /,/, $h1->{$k1}; while (my ($k2, $v2) = each %$h2) { if ( grep( /^$k2$/, @fields ) ) { print $k2, "\t", $v2, "\t", "$k1", "\n" } } }

Comment on updated_again: how to loop through hash tables more efficiently
Download Code
Re: how to loop through hash tables more efficiently
by Anonymous Monk on Sep 17, 2012 at 20:40 UTC

    This is a fairly simple code, but I found to run my code is extremely slow

    And it doesn't work :)

    if (exists $lookup{$k1}) { print $k2, ",", $k1, "\n";
    is never true, never prints anything
Re: how to loop through hash tables more efficiently
by choroba (Abbot) on Sep 17, 2012 at 20:53 UTC
    Can you please specify what output you expect from the given sample files and why? You code does not output anything. To make it ouput something, I had to add the key among its values in hash1:
    k_aX,k_aX
    It then prints all the keys from hash2. That is probably not what you wanted.

    UPDATE:
    BTW, my ($hash1, $hash2)= @ARGV or die "$!"; will return strange errors (or none at all) if no arguments are supplied.

    @ARGV == 2 or die "Specify two arguments.\n";
    should work better. Also, using double quotes in readhash("$hash1") where $hash1 is a file name is useless. Just readhash($hash1);.
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: how to loop through hash tables more efficiently
by roboticus (Canon) on Sep 17, 2012 at 21:02 UTC

    lrl1997:

    A hash table lets you get a value given its name. So you don't need the second loop: You can use the exists function (see perldoc -f exists ) to check if the key is in the other hash.)

    $ cat t my %h = (a=>1, b=>2, c=>3); print "a ", exists($h{a}) ? "exists" : "does not exist", "\n"; print "c ", exists($h{c}) ? "exists" : "does not exist", "\n"; print "f ", exists($h{f}) ? "exists" : "does not exist", "\n"; $ perl t a exists c exists f does not exist

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      So you don't need the second loop... You can use the exists function

      he is already using exists, your code doesn't map well to his example , his faux hash of arrays

        Oh, yes, I seem to have glossed over that. When I read it, I thought he was comparing keys with keys, rather than keys and values.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: how to loop through hash tables more efficiently
by tobyink (Abbot) on Sep 17, 2012 at 21:08 UTC

    This:

    foreach my $k1 (keys %$h1) { my @fields = split /,/, $h1->{$k1}; ... }

    ... will be faster as ...

    while (my ($k1, $v1) = each %$h1) { my @fields = split /,/, $v1; ... }

    each fetches a key-value pair in one go, and except in the case of some tied hashes, will generally be faster than fetching a key and then looking up the value. Both of your hash loops could be optimised this way.

    I have a vague feeling that your two loops could be separated out. Hmmm... I'll have a think about it...

    Update... this works...

    use 5.010; use strict; sub read_hash { my $data = shift; open my $fh, '<', \$data; my %hash; while (<$fh>) { chomp; my ($key,$value)=split /,/, $_, 2; $hash{$key}=$value; } return \%hash; } my $h1 = read_hash(<<'DATA'); k_a1,val_a1,val_a2,val_a3 k_a2,val_a4,val_a5 DATA my $h2 = read_hash(<<'DATA'); k_b1,val_a1 k_b2,val_a2 k_b3,val_a3 k_b4,val_a4 k_b5,val_a5 DATA my %tmp; while (my ($k, $v) = each $h1) { push @{ $tmp{$_} }, $k for split /,/, $v } while (my ($k, $v) = each $h2) { push @{ $tmp{$_} }, $k for split /,/, $v } for (values %tmp) { next if @$_ < 2; say join ',', @$_ }

    If you want the lines output in the same order as your original example, then replace the last two lines with:

    say $_ for sort map { join ',', @$_ } grep { @$_ > 1 } values %tmp;
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

      will generally be faster than fetching a key and then looking up the value.

      microscopically faster

        Indeed; the real speed gain comes from the iterator-like workings of each -- no need to build a list of keys to feed to foreach before even starting the loop (and at 20k keys, that costs quite a bit).

      Thanks. It does run very fasta, but not the format I want.

      I'd like to have output with KEY and VALUE from hash2 then followed by KEY from hash1,all three are separated by tab?

        Then why didn't you show that in your original question?

        for (sort keys %tmp) { next if @{$tmp{$_}} < 2; say join "\t$_\t", reverse @{ $tmp{$_} }; }
        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: how to loop through hash tables more efficiently
by Kenosis (Priest) on Sep 17, 2012 at 22:28 UTC

    Perhaps the following will assist you:

    use Modern::Perl; use File::Slurp qw/read_file/; my ( $hash1, $hash2 ) = @ARGV or die $!; my %hash1 = map { /(.+?),(.+)/; $1 => $2 } grep /.,./, read_file $hash +1; my %hash2 = map { /(.+?),(.+)/; $1 => $2 } grep /.,./, read_file $hash +2; for my $key2 ( sort keys %hash2 ) { for my $key1 ( sort keys %hash1 ) { say "$key2\t$hash2{$key2}\t$key1" if $hash1{$key1} =~ /\b$hash +2{$key2}\b/; } }

    Output:

    k_b1 val_a1 k_a1 k_b2 val_a2 k_a1 k_b3 val_a3 k_a1 k_b4 val_a4 k_a2 k_b5 val_a5 k_a2

    A regex is used within map to initialize the hashes with key/value pairs. The grep /.,./ helps insure that the regex can match--in case there are any blank lines (or other non-matching lines) in the files. Instead of splitting %hash1's values on commas, a word-boundary match is used to see if a value in %hash2 exists as a value in %hash1, and both keys are printed if so.

    Hope this helps!

    Update: Output modified based upon OP's reply to tobyink.

      really want to try out this code, but my work only have perl 5.8.8, doesn't have v5.10.0 yet, :( Thanks though, any other more efficiently traditional way to do it?

        Ah! Then the following should work:

        use strict; use warnings; use File::Slurp qw/read_file/; my ( $hash1, $hash2 ) = @ARGV or die $!; my %hash1 = map { /(.+?),(.+)/; $1 => $2 } grep /.,./, read_file $hash +1; my %hash2 = map { /(.+?),(.+)/; $1 => $2 } grep /.,./, read_file $hash +2; for my $key2 ( sort keys %hash2 ) { for my $key1 ( sort keys %hash1 ) { print "$key2\t$hash2{$key2}\t$key1\n" if $hash1{$key1} =~ /\b$ +hash2{$key2}\b/; } }
Re: updated: how to loop through hash tables more efficiently
by Anonymous Monk on Sep 18, 2012 at 01:59 UTC

    If you simplify your data structure you can ditch internal loops, not that it matters

      This is a brilliant solution...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://994096]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (9)
As of 2014-12-22 22:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (132 votes), past polls