Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

print all data matching identical three alphabets from two different files

by mao9856 (Sexton)
on Nov 15, 2017 at 14:01 UTC ( #1203478=perlquestion: print w/replies, xml ) Need Help??

mao9856 has asked for the wisdom of the Perl Monks concerning the following question:

I have two huge files. File1 has one column and File2 has two columns as follows:

File1

ABCD12

XYZ13

EFGT45

UVWZ34

TSR78

........

File2

ID121 ABC14

ID122 EFG87

ID145 XYZ43

ID157 TSR11

ID181 ABC31

ID962 YTS27

ID529 EFG56

ID684 TSR07

ID921 BAMD80

.............

I want to match first column of File1 and starting three alphabets of second column of File2 and print Ids of those are matched.

Desired output:

ID121 ABC14

ID122 EFG87

ID145 XYZ43

ID157 TSR11

ID181 ABC31

ID529 EFG56

ID684 TSR07

I tried foloowing code:

#!/usr/bin/perl use strict; use warnings; my ($f1,$f2,@patterns,%patts,$f2_rec,$f2_field); $f1 = $ARGV[0]; $f2 = $ARGV[1]; open(PATT,"<", $f1) or die; @patterns = <PATT>; chomp(@patterns); close(PATT) or die; @patts{@patterns} = (1) x @patterns; open(FILE,"<", $f2) or die; while (defined ($f2_rec = <FILE>)) { chomp $f2_rec; $f2_field = (split(/ /,$f2_rec))[0]; if(exists($patts{$f2_field})) { print "$f2_rec\n"; } } close(FILE) or die;

It know it won't separate matching initial three alphabets, but it will match exact values with 'error: use of unitialized value' until use warnings is blocked. Please help.

  • Comment on print all data matching identical three alphabets from two different files
  • Download Code

Replies are listed 'Best First'.
Re: print all data matching identical three alphabets from two different files
by toolic (Bishop) on Nov 15, 2017 at 14:29 UTC
    One way is to create the hash differently. Only keep the 1st 3 letters of File1 as the keys. Then, grab the 2nd column of File2 and, again, only use the 1st 3 letters.
    use strict; use warnings; my ($f1,$f2,%patts,$f2_rec,$f2_field); $f1 = $ARGV[0]; $f2 = $ARGV[1]; open(PATT,"<", $f1) or die; while (<PATT>) { chomp; $patts{substr $_, 0, 3} = 1; } close(PATT) or die; open(FILE,"<", $f2) or die; while (defined ($f2_rec = <FILE>)) { chomp $f2_rec; $f2_field = (split(/ /,$f2_rec))[1]; $f2_field = substr $f2_field, 0, 3; if(exists($patts{$f2_field})) { print "$f2_rec\n"; } } close(FILE) or die;
Re: print all data matching identical three alphabets from two different files
by thanos1983 (Parson) on Nov 15, 2017 at 16:06 UTC

    Hello mao9856,

    Just to add some minor ideas here on the answer of the fellow monk toolic. I would also add a regex to skip the blank lines with next, where it seems to exist on your file(s) with data.

    Also I would change the die statements of close file to warn. For me it is not necessary to stop the whole script in case a file can not close but I would like to know it, this is why I would use warn and not die.

    I would also suggest mao9856 to read this article Don't Open Files in the old way.

    Sample of code including output based on all the minor modifications:

    #!/usr/bin/perl use strict; use warnings; my (%patts, $f2_rec, $f2_field); my $f1 = $ARGV[0]; my $f2 = $ARGV[1]; open(my $fh1,"<", $f1) or die "Failled to open '$f1' $!"; while (<$fh1>) { chomp; next if /^\s*$/; $patts{substr $_, 0, 3} = 1; } close($fh1) or warn "Failled to close '$f1' $!"; open(my $fh2,"<", $f2) or die "Failled to open '$f2' $!"; while (defined ($f2_rec = <$fh2>)) { chomp $f2_rec; next if $f2_rec =~ /^\s*$/; $f2_field = (split(/ /,$f2_rec))[1]; $f2_field = substr $f2_field, 0, 3; if(exists($patts{$f2_field})) { print "$f2_rec\n"; } } close($fh2) or warn "Failled to close '$f2' $!"; # update changing open to clo +se thank to Laurent_R for pointing out __END__ $ perl test.pl file1.txt file2.txt ID121 ABC14 ID122 EFG87 ID145 XYZ43 ID157 TSR11 ID181 ABC31 ID529 EFG56 ID684 TSR07

    Hope this helps, BR.

    Update: Thanks to fellow monk Laurent_R for noticing a typo I have update the sample of code.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: print all data matching identical three alphabets from two different files
by kcott (Bishop) on Nov 15, 2017 at 23:22 UTC

    G'day mao9856,

    Here's another (less busy) way to do it.

    #!/usr/bin/env perl use strict; use warnings; use Inline::Files; my %match; ++$match{substr $_, 0, 3} while <MATCH_DATA>; while (<PARSE_DATA>) { print if $match{substr +(split)[1], 0, 3}; } __MATCH_DATA__ ABCD12 XYZ13 EFGT45 UVWZ34 TSR78 __PARSE_DATA__ ID121 ABC14 ID122 EFG87 ID145 XYZ43 ID157 TSR11 ID181 ABC31 ID962 YTS27 ID529 EFG56 ID684 TSR07 ID921 BAMD80

    Output:

    ID121 ABC14 ID122 EFG87 ID145 XYZ43 ID157 TSR11 ID181 ABC31 ID529 EFG56 ID684 TSR07

    I've used Inline::Files just to show the technique. It's good you've used the 3-argument form of open; but less good that you've used package variables for the filehandles — prefer lexical filehandles instead. Also, your error reporting (i.e. or die) is rubbish: either spend a lot more time on this tedious and error-prone task yourself, or just let Perl do it for you with the autodie pragma.

    Please post data within <code>...</code> tags as you did with your code. This makes it a lot less work for you; your data isn't subject to HTML interpretation (e.g. special characters and whitespace compression); and it makes it a lot easier for us to paste it directly into any example code we might provide.

    — Ken

Re: print all data matching identical three alphabets from two different files
by mao9856 (Sexton) on Dec 22, 2017 at 14:07 UTC

    Thank you all for help:)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1203478]
Approved by Corion
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2022-05-22 16:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (80 votes). Check out past polls.

    Notices?