Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re^2: find common data in multiple files

by mao9856 (Sexton)
on Dec 30, 2017 at 08:32 UTC ( #1206454=note: print w/replies, xml ) Need Help??

in reply to Re: find common data in multiple files
in thread find common data in multiple files

Hi Ken This code worked for me after I put last line: printf "%s %s\n", $_, $uniq{$_} for sort keys %uniq; before closing parenthesis. Thanks a million:)

  • Comment on Re^2: find common data in multiple files

Replies are listed 'Best First'.
Re^3: find common data in multiple files
by kcott (Chancellor) on Dec 31, 2017 at 01:51 UTC
    "This code worked for me after I put last line ... before closing parenthesis. Thanks a million"

    Whilst I appreciate the thanks, it sounds like you've introduced a (possibly subtle) bug. The basic logic for my code is:

    Declare hash SINGLE BLOCK (reading one file): Populate hash LOOP BLOCK (reading all other files): Remove data that isn't common from hash Print hash data

    If you move the Print operation to LOOP BLOCK, you'll get multiple (24) groups of output. That's not what you want, and it would have been plainly obvious if you'd done that, so you've probably done something different to what you've described.

    You've said "I am very beginner of perl" in a couple of places. I suspect you haven't understood the anonymous block I used in SINGLE BLOCK and ended up with logic more like this:

    Declare hash start SINGLE BLOCK Populate hash LOOP BLOCK Print hash data end SINGLE BLOCK

    An anonymous block is just code wrapped in braces:

    { # code here }

    I've used it to provide a limited lexical scope. The variables ($fh, $k and $v) that I've declared in that block, only exist in that block; they are quite different to, and cannot interfere in any way with, the similarly named variables elsewhere in the code. There's also an additional benefit: when $fh goes out of scope, Perl performs an implicit close.

    Anyway, while that's probably useful information you can add to your "beginner of perl" knowledgebase, it's very much guesswork on my part with respect to whatever modifications you made to my original code. If you post your changes, I can provide more concrete feedback.

    — Ken

      I am very grateful for all the useful explanations you have provided. As you know, I am very beginner of perl, i tried to modify your provided code, because it didn't worked for me. I can see the logic for your code. But please let me ask you something. Following of your code isn't giving any output when i use it for 25 files. Please tell me how to fix it.

      #!/usr/bin/env perl use strict; use warnings; use autodie; my @files = glob 'pm_1206312_in*'; my %uniq; { open my $fh, '<', shift @files; while (<$fh>) { my ($k, $v) = split; $uniq{$k} = $v; } } for my $file (@files) { my %data; open my $fh, '<', $file; while (<$fh>) { my ($k, $v) = split; $data{$k} = $v; } for (keys %uniq) { delete $uniq{$_} unless exists $data{$_} and $uniq{$_} eq $dat +a{$_}; } } printf "%s %s\n", $_, $uniq{$_} for sort keys %uniq;

        In my original response, I showed the test files I created with the data from your OP. You didn't say what your filenames were; I had to make up names for my files. The pm indicates it's a PerlMonks file; the 1206312 is the node ID of your OP; the in is for input. Those are fairly standard naming conventions that I use; I very much doubt you use these same conventions.

        My intention was to help you learn; not to do your school/job/whatever work for free. Spend some time understanding the techniques I've used, instead of blindly copying my code and expecting it to work as is. I probably have a different directory structure to you; names I've given to test files (as seen here) won't be the same as filenames on your system; I may have used a CPAN module which you'll first need to install; there could be differences between software versions which require you to write your code slightly differently; you may even have local coding standards that you need to follow.

        If you're genuinely interested in learning, then you'll need to put in some effort yourself and do some troubleshooting. Investigate how %uniq changes as the script runs: from initial (my) declaration to final (printf) output. Do the same with other variables: look at their values and see how those change over the life of the program. If, on the other hand, you just want your work done for free, you're in the wrong place: see "How (Not) To Ask A Question" and, in particular, its "Do Your Own Work" section.

        All of the code that I provided is very straightforward and documented. You should be able to find information on everything I've used in I have this link bookmarked; I recommend you do the same.

        — Ken

        Do all your 25 filenames start pm_1206312_in and if not what are they like ?


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1206454]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2018-06-23 15:11 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.