Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Pattern Matching

by Tan (Initiate)
on Dec 29, 2003 at 14:39 UTC ( #317441=perlquestion: print w/ replies, xml ) Need Help??
Tan has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

As someone who has started out on using Perl, I am faced with a problem which you may be able to recommend solutions to. It essentially requires pattern matching between two data files.

For example:

In file A i have the following fields of data:

(Name, A, B, C, D,) (ABB, A1, B1, C1 D1) (Accor ...) (Aflac Inc...)

etc... and in File B i may have data which will contain some of the above names although not strictly in the same order or description, e.g.

(Name A, B, C, D) (Abb A11, B11, C11, D11) (Accor-plc ...) (Aflac-Inc...)

Ideally I need to search for these names and copy over the field entries for A B C D from one data file to another.

Can perl do "sounds like" pattern searching and then copy the contents of the array?

Cheers.

Edited by Chady -- fixed formatting.

Comment on Pattern Matching
Select or Download Code
Re: Pattern Matching
by liz (Monsignor) on Dec 29, 2003 at 15:21 UTC
Re: Pattern Matching
by CountZero (Bishop) on Dec 29, 2003 at 16:50 UTC
    I think this is dangerous territory you are venturing into. Updating a data-file on basis of another list of data which "sound" the same is a very loose definition to go by. Can you tighten the rules on when to consider a data record to be the same as an existing entry? Otherwise I'm afraid not even Perl will do a good job here.

    The venerable Soundex algorithm is known to map wildly different words to the same "stem" and that may or may not be what you want.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Pattern Matching
by dree (Monsignor) on Dec 29, 2003 at 16:53 UTC
Re: Pattern Matching
by TomDLux (Vicar) on Dec 29, 2003 at 17:25 UTC

    Turn one of your files into a hash. You can use a soundex or other encoding to generate a key, and then an array containing the whole set of data for that name can be stored as the value.

    while ( $line = <> ) { chomp; my @field = split ', ', $line; my $key = encode( $field[0] ); $filedata{$key} = \@field; }

    Then you can look up entries in the second file. Of course, if you use a general encoding, you'll have to do additional verification if you get a match.

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

      As I said above, "Soundex" maps different input to the same result and then it is very dangerous to turn it into a hash: records which map to a same key cannot exist next to one another in the same hash, so you risk dropping records, unless you arrange for a mechanism to resolve such key-clashes.

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://317441]
Approved by Chady
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2014-11-23 10:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (130 votes), past polls