Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: find common data in multiple files

by thanos1983 (Vicar)
on Dec 28, 2017 at 10:18 UTC ( #1206313=note: print w/replies, xml ) Need Help??


in reply to find common data in multiple files

Hello mao9856,

Since you are not telling us what is the problem e.g. the script is not running or it is not producing the desired output with a quick view we can not assist you.

A similar question parse multiple text files keep unique lines only was asked in the past and maybe you can find a possible solution to your problem that many Monks have tackled elegantly.

Update: I just tried to execute your sample of code, and it is not running. It looks you found the code somewhere you pasted here and did asked for someone to solve it for you. Can you show the minimum amount of effort that you tried to resolve it before and make the script executable?

Update 2: I had some time to kill so I put together this script that more or less does what you want. It reads all files from @ARGV and processes every line. Then it only keeps the lines that are in common. Assuming that lines are always the same and they are no combinations. By combinations I mean that you want only to detect duplicated lines.

Sample of code:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use List::MoreUtils 'duplicates'; my (@lines); while (<>) { next if /^\s*$/; # skip empty lines chomp; push @lines, $_; } continue { close ARGV if eof; # Not eof()! } my @dublicatedLines = duplicates @lines; print Dumper \@lines, \@dublicatedLines; __END__ $ perl test.pl File1.txt File3.txt $VAR1 = [ 'ID121 ABC14', 'ID122 EFG87', 'ID145 XYZ43', 'ID157 TSR11', 'ID181 ABC31', 'ID962 YTS27', 'ID567 POH70', 'ID921 BAMD80', 'ID121 ABC14', 'ID612 FLOW12', 'ID122 EFG87', 'ID745 KIDP36', 'ID145 XYZ43', 'ID157 TSR11' ]; $VAR2 = [ 'ID121 ABC14', 'ID122 EFG87', 'ID145 XYZ43', 'ID157 TSR11' ];

Update 2 continue: In case you want to detect uniquely lines that may contain only the $key or only the $value as duplicates, you can easily do it like this.

Sample of code:

#!/usr/bin/perl use strict; use warnings; use Data::Dumper; use List::MoreUtils 'duplicates'; my (@keys, @values); while (<>) { next if /^\s*$/; # skip empty lines chomp; my ($key, $value) = split /\s+/; push @keys, $key; push @values, $value; } continue { close ARGV if eof; # Not eof()! } my @duplicatedKeys = duplicates @keys; my @duplicatedValues = duplicates @values; print Dumper \@keys, \@values, \@duplicatedKeys, \@duplicatedValues; __END__ $ perl test.pl File1.txt File3.txt $VAR1 = [ 'ID121', 'ID122', 'ID145', 'ID157', 'ID181', 'ID962', 'ID567', 'ID921', 'ID121', 'ID612', 'ID122', 'ID745', 'ID145', 'ID157' ]; $VAR2 = [ 'ABC14', 'EFG87', 'XYZ43', 'TSR11', 'ABC31', 'YTS27', 'POH70', 'BAMD80', 'ABC14', 'FLOW12', 'EFG87', 'KIDP36', 'XYZ43', 'TSR11' ]; $VAR3 = [ 'ID121', 'ID122', 'ID145', 'ID157' ]; $VAR4 = [ 'ABC14', 'EFG87', 'XYZ43', 'TSR11' ];

Update 2 continue: I used the module List::MoreUtils and more specifically the function List::MoreUtils/duplicates that "Returns a new list by stripping values in LIST occuring less than twice.". The DATA that I used are from the sample of DATA files that you provided us.

Hope this helps, BR.

Seeking for Perl wisdom...on the process of learning...not there...yet!

Replies are listed 'Best First'.
Re^2: find common data in multiple files
by mao9856 (Sexton) on Dec 29, 2017 at 10:38 UTC

    Thank you for help. I tried to write this code based on my understanding. Please excuse me. I am very beginner of perl. My data contain unique ids (ID157) and name (TSR11) separated by tab. i want to look for both ids and name (ID157 TSR11) if they are present in all 25 files. If ID157 TSR11 is present in all 25 files, it should be printed in the output. This i want to print only those IDs and name that are present in all 25 files. and id and name should print together separated by tab as: ID157 TSR11. I am less familiar with using perl modules, but i am trying my best.

      Hello again mao9856,

      Not knowing it is not a problem, nobody started coding and knew everything. This forum is open and free for people to learn and contribute. You provide us a bit of a non working code. You either got it from someone or you modified it to the point it was not working. It is good for you and all of us to practice and try to provide a working sample of code that shows what you have tried and where you got stuck. For us to provide you a solution it is really easy but it would not mean that it resolves your problem since you will not learn anything out of it.

      Having said that here is a sample of code that it does what you want.

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use List::MoreUtils 'frequency'; my (@lines); my $numberOfFiles = scalar @ARGV; while (<>) { next if /^\s*$/; # skip empty lines (remove if not needed) chomp; push @lines, $_; } continue { close ARGV if eof; # Not eof()! } my @frequencyLines = frequency @lines; my %frequencyHash = @frequencyLines; my @unwanted; foreach my $key (keys %frequencyHash) { if ($frequencyHash{$key} != $numberOfFiles) { push @unwanted, $key; # push any related keys onto @unwanted } } delete @frequencyHash{@unwanted}; my @matches = keys %frequencyHash; print Dumper \@matches; __END__ $ perl test.pl File1.txt File2.txt File3.txt $VAR1 = [ 'ID122 EFG87', 'ID121 ABC14', 'ID157 TSR11' ];

      I used as input files the first 3 files (File 1 -3) as input DATA that you provided us.

      Hope this helps, BR.

      Seeking for Perl wisdom...on the process of learning...not there...yet!

        Hi BR, I tried this code using it as follow:

        $ perl test.pl *.txt

        And it is giving output:

        $VAR1 = [];

        Please help me understand how should it work. Thank you in advance

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1206313]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (9)
As of 2018-07-20 14:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















    Results (435 votes). Check out past polls.

    Notices?