http://www.perlmonks.org?node_id=1206464


in reply to Re^2: find common data in multiple files
in thread find common data in multiple files

"This code worked for me after I put last line ... before closing parenthesis. Thanks a million"

Whilst I appreciate the thanks, it sounds like you've introduced a (possibly subtle) bug. The basic logic for my code is:

Declare hash SINGLE BLOCK (reading one file): Populate hash LOOP BLOCK (reading all other files): Remove data that isn't common from hash Print hash data

If you move the Print operation to LOOP BLOCK, you'll get multiple (24) groups of output. That's not what you want, and it would have been plainly obvious if you'd done that, so you've probably done something different to what you've described.

You've said "I am very beginner of perl" in a couple of places. I suspect you haven't understood the anonymous block I used in SINGLE BLOCK and ended up with logic more like this:

Declare hash start SINGLE BLOCK Populate hash LOOP BLOCK Print hash data end SINGLE BLOCK

An anonymous block is just code wrapped in braces:

{ # code here }

I've used it to provide a limited lexical scope. The variables ($fh, $k and $v) that I've declared in that block, only exist in that block; they are quite different to, and cannot interfere in any way with, the similarly named variables elsewhere in the code. There's also an additional benefit: when $fh goes out of scope, Perl performs an implicit close.

Anyway, while that's probably useful information you can add to your "beginner of perl" knowledgebase, it's very much guesswork on my part with respect to whatever modifications you made to my original code. If you post your changes, I can provide more concrete feedback.

— Ken

Replies are listed 'Best First'.
Re^4: find common data in multiple files
by mao9856 (Sexton) on Dec 31, 2017 at 06:09 UTC

    I am very grateful for all the useful explanations you have provided. As you know, I am very beginner of perl, i tried to modify your provided code, because it didn't worked for me. I can see the logic for your code. But please let me ask you something. Following of your code isn't giving any output when i use it for 25 files. Please tell me how to fix it.

    #!/usr/bin/env perl use strict; use warnings; use autodie; my @files = glob 'pm_1206312_in*'; my %uniq; { open my $fh, '<', shift @files; while (<$fh>) { my ($k, $v) = split; $uniq{$k} = $v; } } for my $file (@files) { my %data; open my $fh, '<', $file; while (<$fh>) { my ($k, $v) = split; $data{$k} = $v; } for (keys %uniq) { delete $uniq{$_} unless exists $data{$_} and $uniq{$_} eq $dat +a{$_}; } } printf "%s %s\n", $_, $uniq{$_} for sort keys %uniq;

      In my original response, I showed the test files I created with the data from your OP. You didn't say what your filenames were; I had to make up names for my files. The pm indicates it's a PerlMonks file; the 1206312 is the node ID of your OP; the in is for input. Those are fairly standard naming conventions that I use; I very much doubt you use these same conventions.

      My intention was to help you learn; not to do your school/job/whatever work for free. Spend some time understanding the techniques I've used, instead of blindly copying my code and expecting it to work as is. I probably have a different directory structure to you; names I've given to test files (as seen here) won't be the same as filenames on your system; I may have used a CPAN module which you'll first need to install; there could be differences between software versions which require you to write your code slightly differently; you may even have local coding standards that you need to follow.

      If you're genuinely interested in learning, then you'll need to put in some effort yourself and do some troubleshooting. Investigate how %uniq changes as the script runs: from initial (my) declaration to final (printf) output. Do the same with other variables: look at their values and see how those change over the life of the program. If, on the other hand, you just want your work done for free, you're in the wrong place: see "How (Not) To Ask A Question" and, in particular, its "Do Your Own Work" section.

      All of the code that I provided is very straightforward and documented. You should be able to find information on everything I've used in http://perldoc.perl.org/perl.html: I have this link bookmarked; I recommend you do the same.

      — Ken

      Do all your 25 filenames start pm_1206312_in and if not what are they like ?

      poj

        "Do all your 25 filenames start pm_1206312_in and if not what are they like ?" All 25 files ends with .txt and starts with 4_os_ The middle part is different for all 25 files. as an example 4_os_abc.txt, 4_os_def.txt...so on.