Re: iterating hash keys?

G'day R56,

Welcome to the monastery.

Firstly, a word about your data. The term list has a special meaning in Perl: see "perldata: List value constructors". I've taken what you've described as lists to be records in files. Given you wrote "... the 'names to be replaced' file ...", that seems correct for the second list; although, until I had read that far, I initially thought you might have been talking about a list of lists (which is something different — see perllol).

Anyway, this means you (probably) have a CSV (comma-separated values) file which is best read using a module like Text::CSV. The reason for this is that there are all sorts of gotchas with CSV files which have already been coded for in these modules. As an example, consider two records: "apples, red,cherries" and "apples, red cherries". If you had an ID for "apples, red", how would you handle the replacement in those two records.

So, I'd suggest you check whether your data really is as simple as the examples you've posted; and consider the chances of it staying that way in the future. You may need to revisit whatever solution you choose based on those findings. The solution I provide below assumes nothing more complex than what you currently show.

Here's my take on a solution. I create a hash mapping names to IDs (same as you). Next, I use the keys of that hash to create a regex with an alternation (e.g. bananas|oranges|...) such that only the names with IDs will be matched. Finally, the replacements are made and the new data is output.

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

my $in_file_name_id = 'pm_1055846_name_id_data.txt';
my $in_file_name_replace = 'pm_1055846_name_replace_data.txt';
my $out_file_name_replaced = 'pm_1055846_name_replaced_out.txt';

open my $in_id_fh, '<', $in_file_name_id;
my %id_for = map { split } <$in_id_fh>;
close $in_id_fh;

my $re = '\b(' . join('|', keys %id_for) . ')\b';

open my $in_replace_fh, '<', $in_file_name_replace;
open my $out_replaced_fh, '>', $out_file_name_replaced;

while (<$in_replace_fh>) {
    s/$re/$id_for{$1}/g;
    print $out_replaced_fh $_;
}
[download]

Here's the files. Notice I added "pineapples", which didn't have an ID, and so wasn't replaced.

$ cat pm_1055846_name_id_data.txt
bananas 456
oranges 23
peaches 897236
kiwis 3726
[download]

$ cat pm_1055846_name_replace_data.txt
bananas,oranges
peaches,peaches,peaches
kiwis
oranges
kiwis,oranges,bananas,bananas
bananas,oranges,pineapples,peaches,kiwis
[download]

$ cat pm_1055846_name_replaced_out.txt
456,23
897236,897236,897236
3726
23
3726,23,456,456
456,23,pineapples,897236,3726
[download]

-- Ken

Comment on Re: iterating hash keys? Select or Download Code

Replies are listed 'Best First'.
Re^2: iterating hash keys? by R56 (Sexton) on Sep 27, 2013 at 14:09 UTC
Well, comparing to what I had, your code is faster than the speed of light! Is there a simple way for the s// to also include names with hyphens in the middle?	[reply]
Re^3: iterating hash keys? by kcott (Archbishop) on Sep 28, 2013 at 06:29 UTC
"Well, comparing to what I had, your code is faster than the speed of light!" That's a good start. :-) "Is there a simple way for the s// to also include names with hyphens in the middle?" The short answer is: yes. The longer answer depends on details. I found a reference you made to input data with hyphens in "Re^8: using hashes"; however, you provided no indication of the output you wanted (except that `20-10,25` was the wrong output when `bana-na,banana` was the input). The following is based on the code I provided earlier. Given these input files: `$ cat pm_1055846_name_id_data.txt bananas 456 oranges 23 peaches 897236 kiwis 3726 banana 25 bana 20 bana-na 15 na 10` [download] `$ cat pm_1055846_name_replace_data.txt bananas,oranges peaches,peaches,peaches kiwis oranges kiwis,oranges,bananas,bananas bananas,oranges,pineapples,peaches,kiwis bana-na,banana ba-na-na,bana-bana,bana-nana` [download] If you want output like this: `$ cat pm_1055846_name_replaced_out.txt 456,23 897236,897236,897236 3726 23 3726,23,456,456 456,23,pineapples,897236,3726 15,25 ba-10-10,20-20,20-nana` [download] Change `my $re = '\b(' . join('\|', keys %id_for) . ')\b';` [download] to `my $re = '\b(' . join('\|', sort { $b cmp $a } keys %id_for) . ')\b';` [download] If you want output like this: `$ cat pm_1055846_name_replaced_out.txt 456,23 897236,897236,897236 3726 23 3726,23,456,456 456,23,pineapples,897236,3726 15,25 ba-na-na,bana-bana,bana-nana` [download] Change `my $re = '\b(' . join('\|', keys %id_for) . ')\b';` [download] to `my $re = '(^\|,)(' . join('\|', sort { $b cmp $a } keys %id_for) . ')(?= +,\|$)';` [download] and `s/$re/$id_for{$1}/g;` [download] to `s/$re/$1$id_for{$2}/g;` [download] If you want something different to these, and are unable to work it out for yourself, provide details as outlined in the "How do I post a question effectively?" guidelines. It would also be useful to advise what version of Perl you're using: I wrote those changes for v5.8; a more efficient version could have been written for a later version. As a hint for doing this yourself, see `(?<=pattern) \K` under Look-Around Assertions in "perlre: Extended Patterns" — `\K` was introduced in v5.10.0 (see "perl5100delta: Regular expressions" for this, and other, regex enhancements). -- Ken	[reply] [d/l] [select]
Re^4: iterating hash keys? by R56 (Sexton) on Sep 30, 2013 at 14:12 UTC
It's the second case: total recognition of the exact pattern, or just let it go. I'm using 5.16, but will take a look into those changes though. Once again, many thanks for your help Ken :)	[reply]
Re^2: iterating hash keys? by R56 (Sexton) on Sep 27, 2013 at 12:11 UTC
Hey Ken, good to be here :) Thank you for the patience to write all that. I don't know yet if the data will be this simple at all times, but it's always better to cover all the options if it doesn't sacrifice speed. Will definitely try out your code to see if I can improve this!	[reply]

In Section Seekers of Perl Wisdom