Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Finding a specific value in another file

by rajiyengar (Initiate)
on Oct 21, 2013 at 05:12 UTC ( #1059063=perlquestion: print w/replies, xml ) Need Help??
rajiyengar has asked for the wisdom of the Perl Monks concerning the following question:

Gurus, I find google examples little confusion. Here is the scenario: I have a specific column (broker) in one file. In another file, I have 4 columns where column 2 is broker and ticker is in column 4. I need to match broker from file 1 in file 2 and pick up ticker from column 4 in file 2. Both these files are not that big (a few thousand rows). Appreciate your help in defining basic structure. Thanks a lot.
  • Comment on Finding a specific value in another file

Replies are listed 'Best First'.
Re: Finding a specific value in another file
by marinersk (Priest) on Oct 21, 2013 at 05:46 UTC
    How you extract the columns out depends on the file format; for CSV or TSV, I would recommend either the Text::CSV or Text::CSV_XS modules. If it is freespace columnar, I'd use split.

    I would then read the broker file and store the brokers in an array.

    Then I would read the ticker file, and for each line, use grep to see if the broker in that file matches anything in the array. If it does, I would store the ticker in a hash, using the broker as the key and the ticker as the value.

    Useful snippets:

    my @brokerElements = split /\s+/, $brokerLine; push @brokers, $lineElements[0]; # ... my @tickerElements = split /\s+/, $tickerLine; my $broker = $tickerElements[1]; my $ticker = $tickerElements[3]; my $broker_regex = quotemeta $broker; if (grep /^$broker_regex$/i, @brokers) { $tickerInfo{$broker} = $ticker; } # ... foreach my $reportBroker (sort keys %tickerInfo) { print "Broker $reportBroker has ticker $tickerInfo{$reportBroker}\ +n"; }

    Good luck with the project.

Re: Finding a specific value in another file
by Laurent_R (Canon) on Oct 21, 2013 at 06:07 UTC

    Read file 2, store the relevant colums of file 2 in a hash (broker as a key, ticker as a value), then read file 1 and look up into the hash for the broker.

      edit: Added teddy bear, removed foot from mouth

      That would also work, but I think it would risk using more memory than loading the first file into an array, since you'd risk loading hash elements upon which you will not be reporting.

      However, because it's you, I naturally presume you have a reason for using that approach, and I'm concerned that I can't see why.

      So I guess I'll bite the bullet and ask -- Why do you recommend loading the ticker info to hash first?

      However, I just asked my teddy bear and the answer came back clearly: My way trades away execution efficiency to gain space efficiency. That is silly given the OP specifically stated there would only be a few thousand lines. The linear search via grep for every line of the ticker file is a complete waste.

      :: sigh ::

      I'll go and hide in my corner now.

        Yes, I guess you've got the idea: using a hash is by far the fastest (and also easiest) method, provided the data will fit into memory, which we know to be the case.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1059063]
Approved by marinersk
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2018-10-17 20:19 GMT
Find Nodes?
    Voting Booth?
    When I need money for a bigger acquisition, I usually ...

    Results (97 votes). Check out past polls.