lrl1997 has asked for the
wisdom of the Perl Monks concerning the following question:
Dear all, I just updated my question below, so is the format of both hashes.
I feel the code I provided below is not fast enough, I would really need some help to improve it's efficiency.
I have two hash tables,
hash1: (keys and values separated by tab)
k_a1 k_b1,k_b2,k_b3
k_a2 k_b4,K_b5
hash2:(keys and values separated by tab)
k_b1 val_a1
k_b2 val_a2
k_b3 val_a3
k_b4 val_a4
k_b5 val_a5
I want to loop through both hash tables and foreach keys in hash2, if the value can be found in the value of hash1, then print out the corresponding keys from both hash1 and hash2 (keys would be keys from hash2, values would be values for hash2 and keys of hash1, all separated by tab): e.g.
k_b1 val_a1 k_a1
k_b2 val_a2 k_a1
k_b3 val_a3 k_a1
k_b4 val_a4 k_a2
k_b5 val_a5 k_a2
This is the code
use strict;
sub read_hash {
my $fname = shift;
open (my $fh, "<",$fname) or die "$!";
my %hash=();
while (<$fh>) {
my $line=$_;
chomp $line;
my ($key,$value)=split /\t/, $line, 2;
$hash{$key}=$value;
}
return \%hash;
}
my $hash1=shift;
my $hash2=shift;
my $h1=read_hash("$hash1");
my $h2=read_hash("$hash2");
my @fields;
while (my ($k1, $v1) = each %$h1)
{
@fields=split /,/, $h1->{$k1};
while (my ($k2, $v2) = each %$h2)
{
if ( grep( /^$k2$/, @fields ) ) {
print $k2, "\t", $v2, "\t", "$k1", "\n"
}
}
}
Re: how to loop through hash tables more efficiently by Anonymous Monk on Sep 17, 2012 at 20:40 UTC |
This is a fairly simple code, but I found to run my code is extremely slow And it doesn't work :)
if (exists $lookup{$k1}) {
print $k2, ",", $k1, "\n";
is never true, never prints anything | [reply] [d/l] |
Re: how to loop through hash tables more efficiently by choroba (Prior) on Sep 17, 2012 at 20:53 UTC |
Can you please specify what output you expect from the given sample files and why? You code does not output anything. To make it ouput something, I had to add the key among its values in hash1: k_aX,k_aX
It then prints all the keys from hash2. That is probably not what you wanted.
UPDATE:
BTW, my ($hash1, $hash2)= @ARGV or die "$!"; will return strange errors (or none at all) if no arguments are supplied. @ARGV == 2 or die "Specify two arguments.\n";
should work better. Also, using double quotes in readhash("$hash1") where $hash1 is a file name is useless. Just
readhash($hash1);.
| [reply] [d/l] [select] |
Re: how to loop through hash tables more efficiently by roboticus (Canon) on Sep 17, 2012 at 21:02 UTC |
lrl1997:
A hash table lets you get a value given its name. So you don't need the second loop: You can use the exists function (see perldoc -f exists ) to check if the key is in the other hash.)
$ cat t
my %h = (a=>1, b=>2, c=>3);
print "a ", exists($h{a}) ? "exists" : "does not exist", "\n";
print "c ", exists($h{c}) ? "exists" : "does not exist", "\n";
print "f ", exists($h{f}) ? "exists" : "does not exist", "\n";
$ perl t
a exists
c exists
f does not exist
...roboticus
When your only tool is a hammer, all problems look like your thumb. | [reply] [d/l] [select] |
|
So you don't need the second loop... You can use the exists function he is already using exists, your code doesn't map well to his example , his faux hash of arrays
| [reply] |
|
Oh, yes, I seem to have glossed over that. When I read it, I thought he was comparing keys with keys, rather than keys and values.
...roboticus
When your only tool is a hammer, all problems look like your thumb.
| [reply] |
Re: how to loop through hash tables more efficiently by tobyink (Prior) on Sep 17, 2012 at 21:08 UTC |
foreach my $k1 (keys %$h1) {
my @fields = split /,/, $h1->{$k1};
...
}
... will be faster as ...
while (my ($k1, $v1) = each %$h1) {
my @fields = split /,/, $v1;
...
}
each fetches a key-value pair in one go, and except in the case of some tied hashes, will generally be faster than fetching a key and then looking up the value. Both of your hash loops could be optimised this way.
I have a vague feeling that your two loops could be separated out. Hmmm... I'll have a think about it...
Update... this works...
use 5.010;
use strict;
sub read_hash
{
my $data = shift;
open my $fh, '<', \$data;
my %hash;
while (<$fh>)
{
chomp;
my ($key,$value)=split /,/, $_, 2;
$hash{$key}=$value;
}
return \%hash;
}
my $h1 = read_hash(<<'DATA');
k_a1,val_a1,val_a2,val_a3
k_a2,val_a4,val_a5
DATA
my $h2 = read_hash(<<'DATA');
k_b1,val_a1
k_b2,val_a2
k_b3,val_a3
k_b4,val_a4
k_b5,val_a5
DATA
my %tmp;
while (my ($k, $v) = each $h1)
{ push @{ $tmp{$_} }, $k for split /,/, $v }
while (my ($k, $v) = each $h2)
{ push @{ $tmp{$_} }, $k for split /,/, $v }
for (values %tmp)
{ next if @$_ < 2; say join ',', @$_ }
If you want the lines output in the same order as your original example, then replace the last two lines with:
say $_ for sort map { join ',', @$_ } grep { @$_ > 1 } values %tmp;
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
| [reply] [d/l] [select] |
|
| [reply] |
|
| [reply] [d/l] [select] |
|
Thanks. It does run very fasta, but not the format I want.
I'd like to have output with KEY and VALUE from hash2 then followed by KEY from hash1,all three are separated by tab?
| [reply] |
|
for (sort keys %tmp) {
next if @{$tmp{$_}} < 2;
say join "\t$_\t", reverse @{ $tmp{$_} };
}
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
| [reply] [d/l] |
Re: how to loop through hash tables more efficiently by Kenosis (Deacon) on Sep 17, 2012 at 22:28 UTC |
use Modern::Perl;
use File::Slurp qw/read_file/;
my ( $hash1, $hash2 ) = @ARGV or die $!;
my %hash1 = map { /(.+?),(.+)/; $1 => $2 } grep /.,./, read_file $hash
+1;
my %hash2 = map { /(.+?),(.+)/; $1 => $2 } grep /.,./, read_file $hash
+2;
for my $key2 ( sort keys %hash2 ) {
for my $key1 ( sort keys %hash1 ) {
say "$key2\t$hash2{$key2}\t$key1" if $hash1{$key1} =~ /\b$hash
+2{$key2}\b/;
}
}
Output:
k_b1 val_a1 k_a1
k_b2 val_a2 k_a1
k_b3 val_a3 k_a1
k_b4 val_a4 k_a2
k_b5 val_a5 k_a2
A regex is used within map to initialize the hashes with key/value pairs. The grep /.,./ helps insure that the regex can match--in case there are any blank lines (or other non-matching lines) in the files. Instead of splitting %hash1's values on commas, a word-boundary match is used to see if a value in %hash2 exists as a value in %hash1, and both keys are printed if so.
Hope this helps!
Update: Output modified based upon OP's reply to tobyink. | [reply] [d/l] [select] |
|
really want to try out this code, but my work only have perl 5.8.8, doesn't have v5.10.0 yet, :(
Thanks though, any other more efficiently traditional way to do it?
| [reply] |
|
use strict;
use warnings;
use File::Slurp qw/read_file/;
my ( $hash1, $hash2 ) = @ARGV or die $!;
my %hash1 = map { /(.+?),(.+)/; $1 => $2 } grep /.,./, read_file $hash
+1;
my %hash2 = map { /(.+?),(.+)/; $1 => $2 } grep /.,./, read_file $hash
+2;
for my $key2 ( sort keys %hash2 ) {
for my $key1 ( sort keys %hash1 ) {
print "$key2\t$hash2{$key2}\t$key1\n" if $hash1{$key1} =~ /\b$
+hash2{$key2}\b/;
}
}
| [reply] [d/l] |
Re: updated: how to loop through hash tables more efficiently by Anonymous Monk on Sep 18, 2012 at 01:59 UTC |
If you simplify your data structure you can ditch internal loops, not that it matters
| [reply] [d/l] |
|
| [reply] |
|
|