Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

G'day R56,

Welcome to the monastery.

Firstly, a word about your data. The term list has a special meaning in Perl: see "perldata: List value constructors". I've taken what you've described as lists to be records in files. Given you wrote "... the 'names to be replaced' file ...", that seems correct for the second list; although, until I had read that far, I initially thought you might have been talking about a list of lists (which is something different — see perllol).

Anyway, this means you (probably) have a CSV (comma-separated values) file which is best read using a module like Text::CSV. The reason for this is that there are all sorts of gotchas with CSV files which have already been coded for in these modules. As an example, consider two records: "apples, red,cherries" and "apples, red cherries". If you had an ID for "apples, red", how would you handle the replacement in those two records.

So, I'd suggest you check whether your data really is as simple as the examples you've posted; and consider the chances of it staying that way in the future. You may need to revisit whatever solution you choose based on those findings. The solution I provide below assumes nothing more complex than what you currently show.

Here's my take on a solution. I create a hash mapping names to IDs (same as you). Next, I use the keys of that hash to create a regex with an alternation (e.g. bananas|oranges|...) such that only the names with IDs will be matched. Finally, the replacements are made and the new data is output.

#!/usr/bin/env perl use strict; use warnings; use autodie; my $in_file_name_id = 'pm_1055846_name_id_data.txt'; my $in_file_name_replace = 'pm_1055846_name_replace_data.txt'; my $out_file_name_replaced = 'pm_1055846_name_replaced_out.txt'; open my $in_id_fh, '<', $in_file_name_id; my %id_for = map { split } <$in_id_fh>; close $in_id_fh; my $re = '\b(' . join('|', keys %id_for) . ')\b'; open my $in_replace_fh, '<', $in_file_name_replace; open my $out_replaced_fh, '>', $out_file_name_replaced; while (<$in_replace_fh>) { s/$re/$id_for{$1}/g; print $out_replaced_fh $_; }

Here's the files. Notice I added "pineapples", which didn't have an ID, and so wasn't replaced.

$ cat pm_1055846_name_id_data.txt bananas 456 oranges 23 peaches 897236 kiwis 3726
$ cat pm_1055846_name_replace_data.txt bananas,oranges peaches,peaches,peaches kiwis oranges kiwis,oranges,bananas,bananas bananas,oranges,pineapples,peaches,kiwis
$ cat pm_1055846_name_replaced_out.txt 456,23 897236,897236,897236 3726 23 3726,23,456,456 456,23,pineapples,897236,3726

-- Ken


In reply to Re: iterating hash keys? by kcott
in thread iterating hash keys? by R56

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-24 01:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found