Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Substitute text string in a file with matching text from another file

by tariqahsan (Beadle)
on Aug 12, 2005 at 15:18 UTC ( #483300=perlquestion: print w/ replies, xml ) Need Help??
tariqahsan has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

What would be an efficient way to substitute lines of text
when portion of it matches with part of the text from another file?
Say I have the 2 files as follows -

file 1:

123|abc|777
234|cde|456
567|xyz|999

file 2:

789|efg|2222222
123|abc|9999999
786|uvw|1234567
123|xxx|0000000
234|cde|0000000
567|xyz|1111111

After processing file 2 using file 1 the output of it should be -

789|efg|2222222|NO MATCH
123|abc|9999999|777
786|uvw|1234567|NO MATCH
123|xxx|0000000|NO MATCH
234|cde|0000000|456
567|xyz|1111111|999

File 2 is quite a large file and file 1 has about 2000
lines

Comment on Substitute text string in a file with matching text from another file
Replies are listed 'Best First'.
Re: Substitute text string in a file with matching text from another file
by ikegami (Pope) on Aug 12, 2005 at 15:25 UTC

    Hashes are very useful for matching.

    use strict; use warnings; my %lookup; { open(my $file1, '<', 'file1.txt') or die("..."); while (<$file1>) { chomp; my @fields = split(/\|/, $_); $lookup{$fields[0]}{$fields[1]} = $fields[2]; } } { open(my $file2, '<', 'file2.txt') or die("..."); while (<$file2>) { chomp; my @fields = split(/\|/, $_); if (exists $lookup{$fields[0]}{$fields[1]}) { $fields[3] = $lookup{$fields[0]}{$fields[1]}; } else { $fields[3] = 'NO MATCH'; } print(join('|', @fields), $/); } }

    Tested.

Re: Substitute text string in a file with matching text from another file
by sh1tn (Priest) on Aug 12, 2005 at 15:56 UTC
    Update: no need to delete:
    # $f1 - small file # $f2 - big file open my $fh, $f1 or die "cannot open $f1: $!\n"; my @file1 = <$fh>; my %file1 = map{ chomp and (join '|', (split /\|/)[0,1]), (join '|', (split /\|/)[2]) }@file1; open my $fh, $f2 or die "cannot open $f2: $!\n"; while(<$fh>){ chomp; my $part = (join '|', (split /\|/)[0,1]); print $_, '|', (exists $file1{$part} ? $file1{$part} : 'NO MATCH' ), +$/; }


Re: Substitute text string in a file with matching text from another file
by davidrw (Prior) on Aug 12, 2005 at 15:40 UTC
    Could do this on one line of shell with the join command, except it doesn't handle compound keys, so it requires this little workaround:
    sort f1 | perl -pe 's/\|/~~/' > f1~ sort f2 | perl -pe 's/\|/~~/' > f2~ join -t'|' -a 1 -a 2 f2~ f1~ \ | perl -pe 's/~~/|/' \ | perl -F'/\|/' -ape 's/$/|NO MATCH/ unless $#F==3; $_'
    As for perl, a cool (yes, could be overkill) way would be to load these two files as tables with DBD::AnyData and just run a single SQL statement (assuming files have cols A,B,C):
    SELECT f2.A, f2.B, f2.C, COALESCE(f1.C, 'NO MATCH') as D FROM f2 LEFT JOIN f1 ON f1.A = f2.A AND f1.B = f2.B
Re: Substitute text string in a file with matching text from another file
by jch341277 (Sexton) on Aug 12, 2005 at 15:46 UTC

    Hashes are good for this kind of thing:

    use strict; use warnings; use Carp; my $hr_f1 = getlines($ARGV[0]); open(IN,"<$ARGV[1]") or croak "unable to open $ARGV[1]: $!"; while(my $l = <IN>) { my ($k,$v) = getkv($l); if (defined $hr_f1->{$k}) { print "$k|$v|$hr_f1->{$k}\n"; } else { print "$k|$v|NO MATCH\n"; } } sub getlines { my $fn = shift; my %h = (); open(IN,"<$fn") or croak "unable to open $fn: $!"; while(my $l = <IN>) { my ($k,$v) = getkv($l); $h{$k} = $v; } close(IN); return \%h; } sub getkv { my $l = shift; chomp $l; my @l = split /\|/, $l; return "$l[0]|$l[1]","$l[2]"; }

    and tested:

    $ ./join.pl f1 f2 789|efg|2222222|NO MATCH 123|abc|9999999|777 786|uvw|1234567|NO MATCH 123|xxx|0000000|NO MATCH 234|cde|0000000|456 567|xyz|1111111|999

    Update: This is the same as ikegami's solution...guess i wasn't quick enough...

Re: Substitute text string in a file with matching text from another file
by Transient (Hermit) on Aug 12, 2005 at 15:47 UTC
    #!/usr/bin/perl use strict; use warnings; # substituted for the FILE my @file2 = qw(789|efg|2222222 123|abc|9999999 786|uvw|1234567 123|xxx +|0000000 234|cde|0000000 567|xyz|1111111); # process file 1 my %file_1_hash = (); while ( <DATA> ) { chomp; my @array = split( /\|/ ); $file_1_hash{join( '|', @array[0..$#array-1] )} = $array[$#array]; } # when file2 is a FILE, just do a while loop here foreach (@file2) { my @file_2_array = split( /\|/ ); my $file_2_key = join( '|', @file_2_array[0..$#file_2_array-1] ); if ( !exists $file_1_hash{$file_2_key} ) { $file_1_hash{$file_2_key} = "NO MATCH"; } print join( '|', $file_2_key, $file_2_array[$#file_2_array], $file_1 +_hash{$file_2_key} ), "\n"; } __DATA__ 123|abc|777 234|cde|456 567|xyz|999

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://483300]
Approved by ikegami
Front-paged by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2015-07-28 10:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (254 votes), past polls