Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^8: find common data in multiple files

by mao9856 (Sexton)
on Jan 05, 2018 at 05:36 UTC ( [id://1206735]=note: print w/replies, xml ) Need Help??


in reply to Re^7: find common data in multiple files
in thread find common data in multiple files

Hi BR

Thank you for your time and patience. As you suggested i tried to match two files (4_os_10.txt and 4_os_11.txt) and I got desired output that is common data with your code and also with the code provided by BillKSmith. I tried it for more that 3 files .txt pair and found that maybe there isn't anything common among all 25 files.

I found that matched data exist only in 18 files out of 25 when i used following code. Please correct me if i get it wrong. It prints matched first column with common ID (say ID121) among all 25 files and then print second name only if it exists in file. Thus output I get files column all IDs and second column of each of 25 .txt files.

#!/usr/bin/env perl use strict; use warnings; my %data; while (<>) { my ( $key, $value ) = split; push( @{ $data{$key} }, $value ); } foreach my $key ( sort keys %data ) { if ( @{ $data{$key} } >= @ARGV ) { print join( "\t", $key, @{ $data{$key} } ), "\n"; } } $ code.pl *.txt

Please suggest me if this code can give me desired output. Regards mao9856

Replies are listed 'Best First'.
Re^9: find common data in multiple files
by thanos1983 (Parson) on Jan 08, 2018 at 11:06 UTC

    Hello mao9856

    I am getting a bit confused with your question :). Your question has multiple great answers from fellow Monks and still we are not able to resolve it because we do not know exactly what is the desired output for you.

    On one of my previous answers to your question e.g. Re: find common data in multiple files I am using a different approach which checks each file and prints at the end all the lines that match each other. I mean for example let's assume that file 1, file 2 and file 3 have one common line that is shared from all this is captured by the script. The script also captures non common lines among all files but common lines that appeared at least once on 2 files.

    I assume something like that is that you are looking for. If not dry to draw a hash an array of arrays something with dummy data to show us the desired output that you want.

    For example:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash = ( 'file1' => 'line common', 'file2' => 'line common', ); print Dumper \%hash; __END__ $ perl test.pl $VAR1 = { 'file1' => 'line common', 'file2' => 'line common' };

    Based on my assumption you want to see which file contains what line that exist at least once in 2 files. If this is true something like that would resolved your question?

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash = ( 'common line' => ['file1', 'file2',], 'common line different' => ['file1', 'file2', 'file3',] ); print Dumper \%hash; __END__ $ perl test.pl $VAR1 = { 'common line' => [ 'file1', 'file2' ], 'common line different' => [ 'file1', 'file2', 'file3' ] };

    Looking forward to your update, BR

    Seeking for Perl wisdom...on the process of learning...not there...yet!

      Hi BR

      I am sorry for all misunderstanding. I try to put it in a simple way. I have 25 different txt files. Each of 25 files has only two columns. One column is id (ID21) and second column is name (ABC12). Each files has different number of rows. Each row has data like (ID21 ABC12). i made it sure that one id and name (say ID21 ABC12) appears only once in one file. That means one file doesn't have any replicate values of "ID" and "name". Also there is fixed "ID" for particular "name". So I want if ID21 ABC12 is present in all 25 txt files it should be printed in output.

      I am thankful for all who have provided me solutions. But i guess problem is ID21 ABC12 is not appearing in all 25 files. So, all codes provided are not working for me. And also i dont know in which of the files ID21 ABC12 is appearing. So, my last posted code was to generate output like this:

      First column: print all common IDs (even if they are present in 2 files), then second column: has file 1 with name which match their id. Nothing prints if it doesn't match ID (so appear blank). Third column: is file 2 which prints name matching its id and so on.

      So what I want my output to be is match all keys first then join the second column of each 25 files labelling file name (that is file1, file2 etc). I want output to be like this:

      ID file1 file2 file3 file4 file5 file6......file25 ID21 ABC12 ABC12 ABC12 ABC12 ID22 XYZ11 XYZ11

      this output shows ID21 and ABC12 is present in more than 2 files. And ID21 and ABC12 is present in file1, file2, file5 and file25 and absent in other files. next row has ID22 and matching name is found in file2 and file4. The following code for getting above mentioned output

      #!/usr/bin/env perl use strict; use warnings; my %data; while (<>) { my ( $key, $value ) = split; push( @{ $data{$key} }, $value ); } foreach my $key ( sort keys %data ) { if ( @{ $data{$key} } >= @ARGV ) { print join( "\t", $key, @{ $data{$key} } ), "\n"; } }

      Please let me know if this was answer for what you asked me:) Regards mao9856

        Using @push does not maintain the correct column alignment. You need to assign values to the element associated with that file number.

        #!/usr/bin/env perl use strict; use warnings; my %data = (); #@ARGV = map { "File$_" }(1..4); my $num = @ARGV; # input for my $i (0..$num-1){ open my $fh,'<',$ARGV[$i] or die "$!"; while (<$fh>) { my ( $key, $value ) = split; $data{$key}[$i] = $value; } close $fh; } # output print join ("\t", 'ID', @ARGV),"\n"; foreach my $key ( sort keys %data ) { my @line = map { $_ || '-' } @{ $data{$key} }[0..$num-1]; if (grep $_ eq '-',@line){ print join ("\t", $key, @line),"\n"; } }
        poj

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1206735]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2024-04-16 15:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found