Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re^7: find common data in multiple files

by thanos1983 (Vicar)
on Jan 03, 2018 at 12:34 UTC ( #1206595=note: print w/replies, xml ) Need Help??

in reply to Re^6: find common data in multiple files
in thread find common data in multiple files

Hello again mao9856,

Can you cat -A 4_os_10.txt, I would like to see all the characters format of your file. It looks it is not matching the sample of DATA that you provide us. Also try to use as @ARGV a few files, for example 4_os_10.txt and 4_os_11.txt if you know that they contain even a single line in common.

Also there is a great answer Re: find common data in multiple files from fellow Monk BillKSmith that is using the module setop and it is simplified a lot. Sample of code on how to run the script Re^2: find common data in multiple files.

Give it a try and let us know for both cases, BR.

Seeking for Perl wisdom...on the process of learning...not there...yet!

Replies are listed 'Best First'.
Re^8: find common data in multiple files
by mao9856 (Sexton) on Jan 05, 2018 at 05:36 UTC

    Hi BR

    Thank you for your time and patience. As you suggested i tried to match two files (4_os_10.txt and 4_os_11.txt) and I got desired output that is common data with your code and also with the code provided by BillKSmith. I tried it for more that 3 files .txt pair and found that maybe there isn't anything common among all 25 files.

    I found that matched data exist only in 18 files out of 25 when i used following code. Please correct me if i get it wrong. It prints matched first column with common ID (say ID121) among all 25 files and then print second name only if it exists in file. Thus output I get files column all IDs and second column of each of 25 .txt files.

    #!/usr/bin/env perl use strict; use warnings; my %data; while (<>) { my ( $key, $value ) = split; push( @{ $data{$key} }, $value ); } foreach my $key ( sort keys %data ) { if ( @{ $data{$key} } >= @ARGV ) { print join( "\t", $key, @{ $data{$key} } ), "\n"; } } $ *.txt

    Please suggest me if this code can give me desired output. Regards mao9856

      Hello mao9856

      I am getting a bit confused with your question :). Your question has multiple great answers from fellow Monks and still we are not able to resolve it because we do not know exactly what is the desired output for you.

      On one of my previous answers to your question e.g. Re: find common data in multiple files I am using a different approach which checks each file and prints at the end all the lines that match each other. I mean for example let's assume that file 1, file 2 and file 3 have one common line that is shared from all this is captured by the script. The script also captures non common lines among all files but common lines that appeared at least once on 2 files.

      I assume something like that is that you are looking for. If not dry to draw a hash an array of arrays something with dummy data to show us the desired output that you want.

      For example:

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash = ( 'file1' => 'line common', 'file2' => 'line common', ); print Dumper \%hash; __END__ $ perl $VAR1 = { 'file1' => 'line common', 'file2' => 'line common' };

      Based on my assumption you want to see which file contains what line that exist at least once in 2 files. If this is true something like that would resolved your question?

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash = ( 'common line' => ['file1', 'file2',], 'common line different' => ['file1', 'file2', 'file3',] ); print Dumper \%hash; __END__ $ perl $VAR1 = { 'common line' => [ 'file1', 'file2' ], 'common line different' => [ 'file1', 'file2', 'file3' ] };

      Looking forward to your update, BR

      Seeking for Perl wisdom...on the process of learning...not there...yet!

        Hi BR

        I am sorry for all misunderstanding. I try to put it in a simple way. I have 25 different txt files. Each of 25 files has only two columns. One column is id (ID21) and second column is name (ABC12). Each files has different number of rows. Each row has data like (ID21 ABC12). i made it sure that one id and name (say ID21 ABC12) appears only once in one file. That means one file doesn't have any replicate values of "ID" and "name". Also there is fixed "ID" for particular "name". So I want if ID21 ABC12 is present in all 25 txt files it should be printed in output.

        I am thankful for all who have provided me solutions. But i guess problem is ID21 ABC12 is not appearing in all 25 files. So, all codes provided are not working for me. And also i dont know in which of the files ID21 ABC12 is appearing. So, my last posted code was to generate output like this:

        First column: print all common IDs (even if they are present in 2 files), then second column: has file 1 with name which match their id. Nothing prints if it doesn't match ID (so appear blank). Third column: is file 2 which prints name matching its id and so on.

        So what I want my output to be is match all keys first then join the second column of each 25 files labelling file name (that is file1, file2 etc). I want output to be like this:

        ID file1 file2 file3 file4 file5 file6......file25 ID21 ABC12 ABC12 ABC12 ABC12 ID22 XYZ11 XYZ11

        this output shows ID21 and ABC12 is present in more than 2 files. And ID21 and ABC12 is present in file1, file2, file5 and file25 and absent in other files. next row has ID22 and matching name is found in file2 and file4. The following code for getting above mentioned output

        #!/usr/bin/env perl use strict; use warnings; my %data; while (<>) { my ( $key, $value ) = split; push( @{ $data{$key} }, $value ); } foreach my $key ( sort keys %data ) { if ( @{ $data{$key} } >= @ARGV ) { print join( "\t", $key, @{ $data{$key} } ), "\n"; } }

        Please let me know if this was answer for what you asked me:) Regards mao9856

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1206595]
[marto]: use dreamweaver they said, it's brilliant they told me...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2018-07-20 12:53 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (431 votes). Check out past polls.