Hash table manipulation

sarvan has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Hash table manipulation by davido (Cardinal) on Jul 11, 2011 at 06:54 UTC
`my @filtered_keys = grep { $_ > 0.4 } keys %hash; foreach $key ( @filtered_keys ) { # Do something with $hash{$key} , which will be the # url's pertaining to each of the keys whos numeric # value is greater than 0.4. }` [download] Dave	[reply] [d/l]
Re^2: Hash table manipulation by spazm (Monk) on Jul 11, 2011 at 21:52 UTC
David's post works, given my understanding of the orginal post. Original poster, your question would be helped by example data. Here is the assumption we are working with. 1) Your data set contains keys which are floats and values containing URL strings. e.g. `my @hash = ( .1 => 'url1', .2 => 'url2', .3 => 'url3', .7 => 'url7', .9 => 'url9', );` [download] 2) you wish to extract the values for which the keys meet certain properties, in your example "`key > 0.4`" 3) you wish to do something with the values and/or keys that match the properties. "Now, what modification i want is, i just want to filter the hash keys for example, i want only the values greater than 0.4 and its correct url as a output rather than all random numbers in key.." Put that all together: #define your data: my @hash = ( .1 => 'url1', .2 => 'url2', .3 => 'url3', .7 => 'url7', .9 => 'url9', ); #define the fitness function / match determination my $good_value = sub { $_ > 0.4 }; #find the keys that match the fitness function: my @good_keys = map { $good_value->($_) } keys @hash; #now do something with the keys: ##eg. print the values, in hash order: for my $key (@good_keys) { print $hash{ $key} , "\n"; } ## grab just the urls from the good keys, without including the keys. my @good_values = @hash{ @good_keys }; ##e.g. save a hash with just the subset of new keys my %good_hash; @good_hash{ @good_keys } = @hash{ @good_keys }; [download] Why the anonymous subroutine for the fitness function? This is to factor out the matching logic from the routines needed to implement the logic. We could put this into a subroutine that takes a sub-ref for the fitness function and returns the wanted keys/value/output generically. `my %hash = ( .1 => 'url1', .2 => 'url2', .5 => 'url5', .9 =>'url9' ); my $fitness = sub { $_ > .4 }; my @good_keys = find_good_keys( \%my_hash, $fitness ); #... sub find_good_keys { my ($hashr, $fitness) = @_; return map { $fitness->($_) keys %$hashr }` [download] This can be extended to be more generic by passing in the hash to the fitness function within find_good_keys. `sub find_good_keys { my ($hashr, $fitness) = @_; return map { $fitness->($_, $hashr ) } keys %$hashr }` [download] This may seem overkill for the "keys > .4" case, but is lovely for a more complex question like "keys between .3 and .7 inclusive, where the value contains foo (case insensitive)." `my $fitness = sub { my ($key, $hashr ) = @_ ; return .3 <= $key && . 7 >= $key && $hashr->{$key} =~ m/foo/i ? 1 : +0 } my @good_keys= find_good_keys( \%hash, $fitness); sub find_good_keys { my ( $hashr, $fitness ) = @_; return map { $fitness->( $key, $hashr ) } keys %$hashr; }` [download]	[reply] [d/l] [select]
Re^3: Hash table manipulation by GertMT (Hermit) on Jul 12, 2011 at 06:21 UTC
I tried to get the first part of this code working but wasn't successful (see below). I had a problem using the map-function. Can someone explain me why it doesn't work as presented here? Thanks, #!/usr/bin/perl -w use strict; #define your data: my %hash = ( .1 => 'url1', .2 => 'url2', .3 => 'url3', .7 => 'url7', .9 => 'url9', ); ## It works if I use this loop # my @good_keys; # foreach my $key ( keys %hash ) { # if ( $key > .4 ) { # push @good_keys, $key; # } # # } # but I can not get it to work with this map-function #define the fitness function / match determination my $good_value = sub { $_ > 0.4 }; #find the keys that match the fitness function: my @good_keys = map { $good_value->($_) } keys %hash; #now do something with the keys: ##eg. print the values, in hash order: for my $key (@good_keys) { print $hash{$key}, "\n"; } [download]	[reply] [d/l]
Re: Hash table manipulation by bart (Canon) on Jul 11, 2011 at 10:42 UTC
I have a hash with keys and values stored in it. The keys are floating point numbers(i.e 0.423.0.523 etc). This tells me enough to conclude that you're using the wrong data structure. Floating point numbers are somewhat elusive, you never get exact values for floating points. So the string representation of the numbers, which is slightly more stable, is still rather unreliable. What if you have a conflict? What if 2 "URL"s have the same key value? What if they're slightly different? You may have one hash entry, or you may have 2. What I think you want is a data structure containing pairs (AKA "tuples") of (numeric value, url). i want only the values greater than 0.4 Even more evidence you don't really want Perl hashes. IMO this is more suitable for what you're after: `@data = ( [ 0.423, 'http://google.com/' ], [ 0.523, 'http://bing.com/' ], );` [download] You can use grep to extract the tuples that match your requirements.	[reply] [d/l]
Re: Hash table manipulation by GrandFather (Saint) on Jul 11, 2011 at 09:44 UTC
First off, 0.423.0.523 is not a floating point number so you may need to specify your problem rather better than you have so far. Treating such strings as floating point numbers almost certainly will not do what you want. Second, if you really are dealing with floating point numbers then whatever approach you use must take into account the fact that the number you use to generate a key (which is a string) most likely isn't the number you will get back if you treat the string as a number. I strongly recommend that you tell us more about what you are trying to do because given the information we have from you so far I suspect that you still have plenty of trouble ahead of you. True laziness is hard work	[reply]
Re^2: Hash table manipulation by sarvan (Sexton) on Jul 12, 2011 at 07:32 UTC
Hello everyone, I will clearly tell you the problem. In my program what i will do is, i will take google's top 10 results for a query and i will take the snippet's or title's from each result and compute the similarity between this returned snippet and the original query.. The computed similarity value is what i stored in the hash as key and the url's will be straightaway stored as hash value.. `for e.g $url={will contain the 10 url's grepped from xml result file} $value=sim();#will contain the similarity computed for all the ten res +ults; %hash; sim{ #here i compute similarity between query title and resulted snippet.. return $val;#i will return similarity value for each result } $hash{$value}=$url;#storing key & value in hash #i will sort the keys in hash in descending order to get the highest v +alue in top and print the url associated with that as a ouput..` [download] Now i want to filter it by some threshold like keys higher then 0.4 or something..	[reply] [d/l]
Re^3: Hash table manipulation by GrandFather (Saint) on Jul 12, 2011 at 11:31 UTC
Ok, given what you are doing your choice of using a hash is fair enough. The precision of the floating point values is fairly unimportant so any rounding or truncation that happens when using the numbers as keys is very unlikely to matter. To select the top N URLs I'd do something like: `use strict; use warnings; my %urls = ( 0.999 => 'www.perlmonks.org', 0.65 => 'www.snakewranglers.org', 0.451 => 'www.jewelmerchants.org', 0.222 => 'www.coffeemerchants.org', 0.12 => 'www.scriptkiddies.org', ); my @inOrder = sort {$b <=> $a} keys %urls; my @topThree = splice @inOrder, 0, 3; print "$_: $urls{$_}\n" for @topThree;` [download] Prints: `0.999: www.perlmonks.org 0.65: www.snakewranglers.org 0.451: www.jewelmerchants.org` [download] True laziness is hard work	[reply] [d/l] [select]
Re^4: Hash table manipulation by sarvan (Sexton) on Jul 13, 2011 at 04:23 UTC
Re^5: Hash table manipulation by GrandFather (Saint) on Jul 13, 2011 at 06:49 UTC
Re: Hash table manipulation by kcott (Archbishop) on Jul 11, 2011 at 07:01 UTC
You can do something like the following one-liner (which I've split into multiple lines for readability). `$ perl -wE ' use strict; my %float_urls = (0.1 => q{A}, 0.2 => q{B}, 0.5 => q{E}); my @wanted_urls = map { $float_urls{$_} } grep { $_ > 0.4 } keys % +float_urls; say join qq{\n}, @wanted_urls; ' E` [download] -- Ken	[reply] [d/l]
Re: Hash table manipulation by Don Coyote (Hermit) on Jul 11, 2011 at 08:41 UTC
note: this is first attempt at a string reg exp in real time. the regexp didnt act exactly as imagined but the tested regexp is at the bottom of the post now. Thinking in terms of the keys being strings and not being floating point numbers I went for a regexp match. This assumes only one decimal precedes the first decimal point at the start of the pattern (key) and each of the rest are 3 digits. `foreach my $nums (keys %hash){ if($hash{$nums} =~ m/^([^0]\|((0\.[5-9]\d{2})\|(0\.4[([^0]{2})([^0]\d)(\d[^0])])))/){ print "$nums is linked with url $hash{$nums}\n"; # or do something with $hash{$nums} } }` [download] Don (revision #401,0*!) Sleepless Addendum: On reflection the first group of `[^0]` after the test for the '4' are not required. The two (not three) groups there should be `\d[^0]` and `[^0]\d` in that order. These test for all options aside from the double zero of the 400 which is what I was over securing for. so not `(0\.4[([^0]{2})([^0]\d)(\d[^0])])` [download] as this tests for '00' first which is least likely, then goes on to test for all double digit numbers failing at 00 01 02 .. 09. then tests for the second 0 again to see if it is 00 or allowable. but `(0\.4[(\d[^0])([^0]\d)])` [download] This just carries out the test for any double digit failing at 00 10 20 .. 90, allowing say 07, then fails on the second test only if the first 0 is there meaning it is 00 and not say 20. I also quickly considered testing for any number that was not higher than 0.4 but this resulted in over matching required when reaching the 400 again, or i had actually fallen asleep by then. lol. re note: ok so I actually decided to test it out on a hash literal specifically designed with fails. plusplus if you already guessed :p - the first `^0` matched and that concluded the test so basically it wasnt testing for 01..09. However this is the way to go about that one. And of course should always test before responding to a seeker of wisdom, my bad. use strict; use warnings; my %numsh = ( '0.000' => "noughtpointnought", '0.004' => "noughtpointohohfour", '0.040' => "noughtpointohfouroh", '0.123' => "noughtpointonetwothree", '0.399' => "noughtpointthreeninenine", '0.400' => "noughtpointfourohoh", '0.405' => "noughtpointfourohfive", '0.410' => "noughtpointfouroneoh", '0.411' => "noughtpointfouroneone", '0.490' => "noughtpointfournineoh", '0.500' => "noughtpointfiveohoh", '0.904' => "noughtpointnineohfour", '1.123' => "onepointonetwothree", '4.000' => "fourpointhohoho", '4.050' => "furpintohfiveoh", '4.400' => "fourpointfourohoh", '4.405' => "fourpointfourohfive", '4.410' => "fourhtpointfouroneoh", '4.411' => "fourpointfouroneone", '4.490' => "fourpointfournineoh", '5.050' => "fivepointohfiveoh", '7.050' => "sevenpointohfiveoh", '0goodword' => 'nogoodword', ); #print scalar(%numsh).$/; foreach my $num (sort keys %numsh){ if ($num =~ m/^([^0]\|(0\.([5..9]\|4([^0]\|(0[^0])))))/){ print "$num is $numsh{$num}".$/; } } exit(0); [download] so you have to explicitly 'or' match, a non zero character first followed by the 'or' zero followed by non-zero. Which funnily enough is a recursion of the start of the regex itself.... and for added measure the lower than regexp seems to work like this: `if ($num =~ m/^(0\.([0-3]\|400))/){ print "$num is $numsh{$num}".$/; }` [download] note the different range operators '5..9' in the first and '0-3' in the second. strange huh. Don	[reply] [d/l] [select]


Keep It Simple, Stupid
	PerlMonks