Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Pattern Matching

by Tan (Initiate)
on Dec 29, 2003 at 14:39 UTC ( #317441=perlquestion: print w/replies, xml ) Need Help??
Tan has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

As someone who has started out on using Perl, I am faced with a problem which you may be able to recommend solutions to. It essentially requires pattern matching between two data files.

For example:

In file A i have the following fields of data:

(Name, A, B, C, D,) (ABB, A1, B1, C1 D1) (Accor ...) (Aflac Inc...)

etc... and in File B i may have data which will contain some of the above names although not strictly in the same order or description, e.g.

(Name A, B, C, D) (Abb A11, B11, C11, D11) (Accor-plc ...) (Aflac-Inc...)

Ideally I need to search for these names and copy over the field entries for A B C D from one data file to another.

Can perl do "sounds like" pattern searching and then copy the contents of the array?


Edited by Chady -- fixed formatting.

Replies are listed 'Best First'.
Re: Pattern Matching
by liz (Monsignor) on Dec 29, 2003 at 15:21 UTC
Re: Pattern Matching
by dree (Monsignor) on Dec 29, 2003 at 16:53 UTC
Re: Pattern Matching
by CountZero (Bishop) on Dec 29, 2003 at 16:50 UTC
    I think this is dangerous territory you are venturing into. Updating a data-file on basis of another list of data which "sound" the same is a very loose definition to go by. Can you tighten the rules on when to consider a data record to be the same as an existing entry? Otherwise I'm afraid not even Perl will do a good job here.

    The venerable Soundex algorithm is known to map wildly different words to the same "stem" and that may or may not be what you want.


    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Pattern Matching
by TomDLux (Vicar) on Dec 29, 2003 at 17:25 UTC

    Turn one of your files into a hash. You can use a soundex or other encoding to generate a key, and then an array containing the whole set of data for that name can be stored as the value.

    while ( $line = <> ) { chomp; my @field = split ', ', $line; my $key = encode( $field[0] ); $filedata{$key} = \@field; }

    Then you can look up entries in the second file. Of course, if you use a general encoding, you'll have to do additional verification if you get a match.


      As I said above, "Soundex" maps different input to the same result and then it is very dangerous to turn it into a hash: records which map to a same key cannot exist next to one another in the same hash, so you risk dropping records, unless you arrange for a mechanism to resolve such key-clashes.


      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://317441]
Approved by Chady
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2018-07-20 07:20 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (426 votes). Check out past polls.