Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Substitute text string in a file with matching text from another file

by tariqahsan (Beadle)
on Aug 12, 2005 at 15:18 UTC ( #483300=perlquestion: print w/ replies, xml ) Need Help??
tariqahsan has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

What would be an efficient way to substitute lines of text
when portion of it matches with part of the text from another file?
Say I have the 2 files as follows -

file 1:

123|abc|777
234|cde|456
567|xyz|999

file 2:

789|efg|2222222
123|abc|9999999
786|uvw|1234567
123|xxx|0000000
234|cde|0000000
567|xyz|1111111

After processing file 2 using file 1 the output of it should be -

789|efg|2222222|NO MATCH
123|abc|9999999|777
786|uvw|1234567|NO MATCH
123|xxx|0000000|NO MATCH
234|cde|0000000|456
567|xyz|1111111|999

File 2 is quite a large file and file 1 has about 2000
lines

Comment on Substitute text string in a file with matching text from another file
Re: Substitute text string in a file with matching text from another file
by ikegami (Pope) on Aug 12, 2005 at 15:25 UTC

    Hashes are very useful for matching.

    use strict; use warnings; my %lookup; { open(my $file1, '<', 'file1.txt') or die("..."); while (<$file1>) { chomp; my @fields = split(/\|/, $_); $lookup{$fields[0]}{$fields[1]} = $fields[2]; } } { open(my $file2, '<', 'file2.txt') or die("..."); while (<$file2>) { chomp; my @fields = split(/\|/, $_); if (exists $lookup{$fields[0]}{$fields[1]}) { $fields[3] = $lookup{$fields[0]}{$fields[1]}; } else { $fields[3] = 'NO MATCH'; } print(join('|', @fields), $/); } }

    Tested.

Re: Substitute text string in a file with matching text from another file
by davidrw (Prior) on Aug 12, 2005 at 15:40 UTC
    Could do this on one line of shell with the join command, except it doesn't handle compound keys, so it requires this little workaround:
    sort f1 | perl -pe 's/\|/~~/' > f1~ sort f2 | perl -pe 's/\|/~~/' > f2~ join -t'|' -a 1 -a 2 f2~ f1~ \ | perl -pe 's/~~/|/' \ | perl -F'/\|/' -ape 's/$/|NO MATCH/ unless $#F==3; $_'
    As for perl, a cool (yes, could be overkill) way would be to load these two files as tables with DBD::AnyData and just run a single SQL statement (assuming files have cols A,B,C):
    SELECT f2.A, f2.B, f2.C, COALESCE(f1.C, 'NO MATCH') as D FROM f2 LEFT JOIN f1 ON f1.A = f2.A AND f1.B = f2.B
Re: Substitute text string in a file with matching text from another file
by jch341277 (Sexton) on Aug 12, 2005 at 15:46 UTC

    Hashes are good for this kind of thing:

    use strict; use warnings; use Carp; my $hr_f1 = getlines($ARGV[0]); open(IN,"<$ARGV[1]") or croak "unable to open $ARGV[1]: $!"; while(my $l = <IN>) { my ($k,$v) = getkv($l); if (defined $hr_f1->{$k}) { print "$k|$v|$hr_f1->{$k}\n"; } else { print "$k|$v|NO MATCH\n"; } } sub getlines { my $fn = shift; my %h = (); open(IN,"<$fn") or croak "unable to open $fn: $!"; while(my $l = <IN>) { my ($k,$v) = getkv($l); $h{$k} = $v; } close(IN); return \%h; } sub getkv { my $l = shift; chomp $l; my @l = split /\|/, $l; return "$l[0]|$l[1]","$l[2]"; }

    and tested:

    $ ./join.pl f1 f2 789|efg|2222222|NO MATCH 123|abc|9999999|777 786|uvw|1234567|NO MATCH 123|xxx|0000000|NO MATCH 234|cde|0000000|456 567|xyz|1111111|999

    Update: This is the same as ikegami's solution...guess i wasn't quick enough...

Re: Substitute text string in a file with matching text from another file
by Transient (Hermit) on Aug 12, 2005 at 15:47 UTC
    #!/usr/bin/perl use strict; use warnings; # substituted for the FILE my @file2 = qw(789|efg|2222222 123|abc|9999999 786|uvw|1234567 123|xxx +|0000000 234|cde|0000000 567|xyz|1111111); # process file 1 my %file_1_hash = (); while ( <DATA> ) { chomp; my @array = split( /\|/ ); $file_1_hash{join( '|', @array[0..$#array-1] )} = $array[$#array]; } # when file2 is a FILE, just do a while loop here foreach (@file2) { my @file_2_array = split( /\|/ ); my $file_2_key = join( '|', @file_2_array[0..$#file_2_array-1] ); if ( !exists $file_1_hash{$file_2_key} ) { $file_1_hash{$file_2_key} = "NO MATCH"; } print join( '|', $file_2_key, $file_2_array[$#file_2_array], $file_1 +_hash{$file_2_key} ), "\n"; } __DATA__ 123|abc|777 234|cde|456 567|xyz|999
Re: Substitute text string in a file with matching text from another file
by sh1tn (Priest) on Aug 12, 2005 at 15:56 UTC
    Update: no need to delete:
    # $f1 - small file # $f2 - big file open my $fh, $f1 or die "cannot open $f1: $!\n"; my @file1 = <$fh>; my %file1 = map{ chomp and (join '|', (split /\|/)[0,1]), (join '|', (split /\|/)[2]) }@file1; open my $fh, $f2 or die "cannot open $f2: $!\n"; while(<$fh>){ chomp; my $part = (join '|', (split /\|/)[0,1]); print $_, '|', (exists $file1{$part} ? $file1{$part} : 'NO MATCH' ), +$/; }


Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://483300]
Approved by ikegami
Front-paged by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2014-07-22 11:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (110 votes), past polls