http://www.perlmonks.org?node_id=1206364


in reply to find common data in multiple files

G'day mao9856,

I'd read through one file and store all of its data in a hash; then read through the remaining files, removing hash data that wasn't common. Given these files (in the spoiler) using data from your OP:

This code:

#!/usr/bin/env perl use strict; use warnings; use autodie; my @files = glob 'pm_1206312_in*'; my %uniq; { open my $fh, '<', shift @files; while (<$fh>) { my ($k, $v) = split; $uniq{$k} = $v; } } for my $file (@files) { my %data; open my $fh, '<', $file; while (<$fh>) { my ($k, $v) = split; $data{$k} = $v; } for (keys %uniq) { delete $uniq{$_} unless exists $data{$_} and $uniq{$_} eq $dat +a{$_}; } } printf "%s %s\n", $_, $uniq{$_} for sort keys %uniq;

Produces this output:

ID121 ABC14 ID122 EFG87 ID157 TSR11

— Ken

Replies are listed 'Best First'.
Re^2: find common data in multiple files
by mao9856 (Sexton) on Dec 30, 2017 at 08:32 UTC

    Hi Ken This code worked for me after I put last line: printf "%s %s\n", $_, $uniq{$_} for sort keys %uniq; before closing parenthesis. Thanks a million:)

      "This code worked for me after I put last line ... before closing parenthesis. Thanks a million"

      Whilst I appreciate the thanks, it sounds like you've introduced a (possibly subtle) bug. The basic logic for my code is:

      Declare hash SINGLE BLOCK (reading one file): Populate hash LOOP BLOCK (reading all other files): Remove data that isn't common from hash Print hash data

      If you move the Print operation to LOOP BLOCK, you'll get multiple (24) groups of output. That's not what you want, and it would have been plainly obvious if you'd done that, so you've probably done something different to what you've described.

      You've said "I am very beginner of perl" in a couple of places. I suspect you haven't understood the anonymous block I used in SINGLE BLOCK and ended up with logic more like this:

      Declare hash start SINGLE BLOCK Populate hash LOOP BLOCK Print hash data end SINGLE BLOCK

      An anonymous block is just code wrapped in braces:

      { # code here }

      I've used it to provide a limited lexical scope. The variables ($fh, $k and $v) that I've declared in that block, only exist in that block; they are quite different to, and cannot interfere in any way with, the similarly named variables elsewhere in the code. There's also an additional benefit: when $fh goes out of scope, Perl performs an implicit close.

      Anyway, while that's probably useful information you can add to your "beginner of perl" knowledgebase, it's very much guesswork on my part with respect to whatever modifications you made to my original code. If you post your changes, I can provide more concrete feedback.

      — Ken

        I am very grateful for all the useful explanations you have provided. As you know, I am very beginner of perl, i tried to modify your provided code, because it didn't worked for me. I can see the logic for your code. But please let me ask you something. Following of your code isn't giving any output when i use it for 25 files. Please tell me how to fix it.

        #!/usr/bin/env perl use strict; use warnings; use autodie; my @files = glob 'pm_1206312_in*'; my %uniq; { open my $fh, '<', shift @files; while (<$fh>) { my ($k, $v) = split; $uniq{$k} = $v; } } for my $file (@files) { my %data; open my $fh, '<', $file; while (<$fh>) { my ($k, $v) = split; $data{$k} = $v; } for (keys %uniq) { delete $uniq{$_} unless exists $data{$_} and $uniq{$_} eq $dat +a{$_}; } } printf "%s %s\n", $_, $uniq{$_} for sort keys %uniq;