Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^6: find common data in multiple files

by mao9856 (Sexton)
on Jan 03, 2018 at 06:49 UTC ( #1206579=note: print w/replies, xml ) Need Help??


in reply to Re^5: find common data in multiple files
in thread find common data in multiple files

Hello Thank you for your time. I am using Linux Ubantu 16.04. Here is sample of dir

$ ls -la total 272 drwxrwxr-x 2 meetal meetal 4096 Jan 3 12:06 . drwxrwxr-x 3 meetal meetal 4096 Jan 3 11:54 .. -rw-rw-r-- 1 meetal meetal 20188 Jan 3 11:54 4_os_10.txt -rw-rw-r-- 1 meetal meetal 14225 Jan 3 11:54 4_os_11.txt -rw-rw-r-- 1 meetal meetal 20788 Jan 3 11:54 4_os_12.txt -rw-rw-r-- 1 meetal meetal 3217 Jan 3 11:54 4_os_13.txt -rw-rw-r-- 1 meetal meetal 13845 Jan 3 11:54 4_os_14.txt -rw-rw-r-- 1 meetal meetal 1117 Jan 3 11:54 4_os_15.txt -rw-rw-r-- 1 meetal meetal 1929 Jan 3 11:54 4_os_16.txt -rw-rw-r-- 1 meetal meetal 15223 Jan 3 11:54 4_os_17.txt -rw-rw-r-- 1 meetal meetal 2020 Jan 3 11:54 4_os_18.txt -rw-rw-r-- 1 meetal meetal 14866 Jan 3 11:54 4_os_19.txt -rw-rw-r-- 1 meetal meetal 3068 Jan 3 11:54 4_os_1.txt -rw-rw-r-- 1 meetal meetal 16040 Jan 3 11:54 4_os_20.txt -rw-rw-r-- 1 meetal meetal 10557 Jan 3 11:54 4_os_2.txt -rw-rw-r-- 1 meetal meetal 10653 Jan 3 11:54 4_os_3.txt -rw-rw-r-- 1 meetal meetal 6558 Jan 3 11:54 4_os_4.txt -rw-rw-r-- 1 meetal meetal 18823 Jan 3 11:54 4_os_5.txt -rw-rw-r-- 1 meetal meetal 5934 Jan 3 11:54 4_os_6.txt -rw-rw-r-- 1 meetal meetal 17570 Jan 3 11:54 4_os_7.txt -rw-rw-r-- 1 meetal meetal 7785 Jan 3 11:54 4_os_8.txt -rw-rw-r-- 1 meetal meetal 13825 Jan 3 11:54 4_os_9.txt -rw------- 1 meetal meetal 12288 Jan 3 11:54 .inna.pl.swp -rw-rw-r-- 1 meetal meetal 651 Jan 3 11:54 test.pl

And output i get

$ perl test.pl *.txt $VAR1 = [];
Regards mao9856

Replies are listed 'Best First'.
Re^7: find common data in multiple files
by thanos1983 (Vicar) on Jan 03, 2018 at 12:34 UTC

    Hello again mao9856,

    Can you cat -A 4_os_10.txt, I would like to see all the characters format of your file. It looks it is not matching the sample of DATA that you provide us. Also try to use as @ARGV a few files, for example 4_os_10.txt and 4_os_11.txt if you know that they contain even a single line in common.

    Also there is a great answer Re: find common data in multiple files from fellow Monk BillKSmith that is using the module setop and it is simplified a lot. Sample of code on how to run the script Re^2: find common data in multiple files.

    Give it a try and let us know for both cases, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!

      Hi BR

      Thank you for your time and patience. As you suggested i tried to match two files (4_os_10.txt and 4_os_11.txt) and I got desired output that is common data with your code and also with the code provided by BillKSmith. I tried it for more that 3 files .txt pair and found that maybe there isn't anything common among all 25 files.

      I found that matched data exist only in 18 files out of 25 when i used following code. Please correct me if i get it wrong. It prints matched first column with common ID (say ID121) among all 25 files and then print second name only if it exists in file. Thus output I get files column all IDs and second column of each of 25 .txt files.

      #!/usr/bin/env perl use strict; use warnings; my %data; while (<>) { my ( $key, $value ) = split; push( @{ $data{$key} }, $value ); } foreach my $key ( sort keys %data ) { if ( @{ $data{$key} } >= @ARGV ) { print join( "\t", $key, @{ $data{$key} } ), "\n"; } } $ code.pl *.txt

      Please suggest me if this code can give me desired output. Regards mao9856

        Hello mao9856

        I am getting a bit confused with your question :). Your question has multiple great answers from fellow Monks and still we are not able to resolve it because we do not know exactly what is the desired output for you.

        On one of my previous answers to your question e.g. Re: find common data in multiple files I am using a different approach which checks each file and prints at the end all the lines that match each other. I mean for example let's assume that file 1, file 2 and file 3 have one common line that is shared from all this is captured by the script. The script also captures non common lines among all files but common lines that appeared at least once on 2 files.

        I assume something like that is that you are looking for. If not dry to draw a hash an array of arrays something with dummy data to show us the desired output that you want.

        For example:

        #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash = ( 'file1' => 'line common', 'file2' => 'line common', ); print Dumper \%hash; __END__ $ perl test.pl $VAR1 = { 'file1' => 'line common', 'file2' => 'line common' };

        Based on my assumption you want to see which file contains what line that exist at least once in 2 files. If this is true something like that would resolved your question?

        #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %hash = ( 'common line' => ['file1', 'file2',], 'common line different' => ['file1', 'file2', 'file3',] ); print Dumper \%hash; __END__ $ perl test.pl $VAR1 = { 'common line' => [ 'file1', 'file2' ], 'common line different' => [ 'file1', 'file2', 'file3' ] };

        Looking forward to your update, BR

        Seeking for Perl wisdom...on the process of learning...not there...yet!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1206579]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (2)
As of 2018-08-18 02:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Asked to put a square peg in a round hole, I would:









    Results (184 votes). Check out past polls.

    Notices?