comment on

G'day R56,

Welcome to the monastery.

Firstly, a word about your data. The term list has a special meaning in Perl: see "perldata: List value constructors". I've taken what you've described as lists to be records in files. Given you wrote "... the 'names to be replaced' file ...", that seems correct for the second list; although, until I had read that far, I initially thought you might have been talking about a list of lists (which is something different — see perllol).

Anyway, this means you (probably) have a CSV (comma-separated values) file which is best read using a module like Text::CSV. The reason for this is that there are all sorts of gotchas with CSV files which have already been coded for in these modules. As an example, consider two records: "apples, red,cherries" and "apples, red cherries". If you had an ID for "apples, red", how would you handle the replacement in those two records.

So, I'd suggest you check whether your data really is as simple as the examples you've posted; and consider the chances of it staying that way in the future. You may need to revisit whatever solution you choose based on those findings. The solution I provide below assumes nothing more complex than what you currently show.

Here's my take on a solution. I create a hash mapping names to IDs (same as you). Next, I use the keys of that hash to create a regex with an alternation (e.g. bananas|oranges|...) such that only the names with IDs will be matched. Finally, the replacements are made and the new data is output.

#!/usr/bin/env perl

use strict;
use warnings;
use autodie;

my $in_file_name_id = 'pm_1055846_name_id_data.txt';
my $in_file_name_replace = 'pm_1055846_name_replace_data.txt';
my $out_file_name_replaced = 'pm_1055846_name_replaced_out.txt';

open my $in_id_fh, '<', $in_file_name_id;
my %id_for = map { split } <$in_id_fh>;
close $in_id_fh;

my $re = '\b(' . join('|', keys %id_for) . ')\b';

open my $in_replace_fh, '<', $in_file_name_replace;
open my $out_replaced_fh, '>', $out_file_name_replaced;

while (<$in_replace_fh>) {
    s/$re/$id_for{$1}/g;
    print $out_replaced_fh $_;
}
[download]

Here's the files. Notice I added "pineapples", which didn't have an ID, and so wasn't replaced.

$ cat pm_1055846_name_id_data.txt
bananas 456
oranges 23
peaches 897236
kiwis 3726
[download]

$ cat pm_1055846_name_replace_data.txt
bananas,oranges
peaches,peaches,peaches
kiwis
oranges
kiwis,oranges,bananas,bananas
bananas,oranges,pineapples,peaches,kiwis
[download]

$ cat pm_1055846_name_replaced_out.txt
456,23
897236,897236,897236
3726
23
3726,23,456,456
456,23,pineapples,897236,3726
[download]

-- Ken

In reply to Re: iterating hash keys? by kcott
in thread iterating hash keys? by R56

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


laziness, impatience, and hubris
	PerlMonks