Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Perl-Sensitive Sunglasses
 
PerlMonks  

correspondence between two arrays

by anasuya (Novice)
on Nov 24, 2011 at 09:35 UTC ( #939831=perlquestion: print w/ replies, xml ) Need Help??
anasuya has asked for the wisdom of the Perl Monks concerning the following question:

i have a file which has the following lines:
1GEG 3RU7 1BXS 2JG7 3QWU 2BHP 3ABU
I also have another file as:
a.1.1.1 1GEG a.4.3.5 1BXS a.6.7.5 2JG7 a.8.7.9 2BHP a.3.2.1 3RU7

I need to do the following task : for each value in file1, i have to check whether that value exists in second field of file2, and if it does, i want to extract out the corresponding "FIRST FIELD" of file2. In short I want smthing like this:

1GEG => a.1.1.1 3RU7 => a.3.2.1 1BXS => a.4.3.5 2JG7 => a.6.7.5 3QWU => NIL 2BHP => a.8.7.9 3ABU=> NIL

i have the logic. i know this will work. bt i am not able to get it into proper code. here it is: read the elements of first file into a hash, and then if the hash key exists for a line I read from the second file, then it obviously existed in the first file. how do i go about it?

Comment on correspondence between two arrays
Select or Download Code
Re: correspondence between two arrays
by marto (Chancellor) on Nov 24, 2011 at 09:40 UTC

    "bt i am not able to get it into proper code"

    Post the code you have which doesn't work the way you want.

      open AD, "file1.txt"; @arr=<AD>; foreach $line(@arr) { $name{substr($line,0,4)}++; } close AD; #hash out of the contents of the first file=MADE #----------- open FH, "file2.txt"; @arr2=<FH>; foreach $a(@arr2) { @val=split(' ',$a); foreach $v($#val) if (...what do i do here???..) {print..... } }
        You said
        for each value in file1, i have to check whether that value exists in second field of file2
        But you iterate over @arr2 in your code. It might be better to hash the second file, and then go through the first one and print the corresponding hashed value, if it exists, or NIL otherwise.

        You might need to chomp the lines if you do not want the \n included in your strings (not needed if the key is always the last thing on any line).

        For starters, you don't check if open() actually worked. That's not good. You should also use strict; and use warnings; too.

        You should also use a consistent coding and indenting style. Doing so will help you see the logic flow of your own code. perltidy can help you with that.

        This may sound like i'm picking out the "unimportant" parts of software design. Well, actually, no i'm not. First you have to get the basics right. Then, the tools you use can actually help you in developing your script by telling you about certain problems you might have overlooked.

        Don't use '#ff0000':
        use Acme::AutoColor; my $redcolor = RED();
        All colors subject to change without notice.
Re: correspondence between two arrays
by ansh batra (Friar) on Nov 24, 2011 at 10:46 UTC
    open(FILE,"< f1.txt"); @linesf1=<FILE>; close(FILE); open(FILE,"< f2.txt"); @linesf2=<FILE>; close(FILE); foreach $linef1(@linesf1) { $found=0; chomp($linef1); foreach $linef2(@linesf2) { chomp($linef2); if($linef2=~/ $linef1/) { print "$linef1--->$`\n"; $found=1; } } if(!$found) { print "$linef1--->null\n"; } }
    --output--
    1GEG--->a.1.1.1 3RU7--->a.3.2.1 1BXS--->a.4.3.5 2JG7--->a.6.7.5 3QWU--->null 2BHP--->a.8.7.9 3ABU--->null
    sorry i have done it without using hashes

      I see little to be 'sorry' about in your new code at Re: correspondence between two arrays -- aside from your opens (see the first bullet below for a preferred form). Unless that's the basis upon which other (aka "wiser"?) monks have downvoted the new node, I think the - -s are ill-justified: while 'XP is just a game,' some newcomers do still conflate rep and XP & mis-read a node's rep as a measure of its merit.

      In any case, your latest would benefit from:

      • syntactically-correct 3-argument opens with lexical handles and testing ( open($fh1,'<', "$file1")  or die "Can't open $file: $!"; where $file1 and $file2 are declared and instantiated with your filenames)
      • use of strict and warnings (see The strictures, according to Seuss)
      • better indentation (and vertical white space - Perl doesn't charge for either)
              ...and
      • setting my $found to " " (e.g., for the sake of the reader/maintainer, an explicit string, rather than a number).

      Please note also that the comment at line 7 in your first code (while -- arguably -- "technically correct") uses "hash" in a manner that's potentially misleading. Although PHP (for example) uses "hash" to describe the constructs there, the Perl venacular uses "array."

        I'd guess the downvotes (though I agree that may be a harsh response to a solution that could be much better but does work) are probably because he's looping through the entire second array for every item in the first array, instead of using a hash.

        Writing the above got me curious: Just how bad is looping through an array and comparing as opposed to checking against a hash's keys? Very, very bad, it turns out. I wrote the script below, based on this task, to benchmark the two methods with a variable number of items in the lists. I only used one iteration for Benchmark (and removed the warnings to save space), because the test takes long enough to do once with larger lists, and I didn't want it to take all day. But the differences are large enough to ignore whatever margin of error that more iterations would smooth out. The hash method also has the overhead of splitting the elements of the second array and creating the hash.

        With only 1000 items in each list, the array/array method takes about 2.75 seconds -- fast enough to live with, if you're not running it over and over all day. But the hash method is so fast that Benchmark seems unable to display the time elapsed sensibly. Bumping the lists up to 5000 items each, the array/array method goes up to almost 10 seconds, but the hash method is still down at .01 seconds, or almost 1000 times faster. At 10,000 items, array/array is up to 45 secs, and the hash is still at .02, or 1500 times faster. So not only is the array/array method much slower, but it scales much worse, since the number of loops and comparisons increases by the square of the list size. At 100,000 items in each array, the array/array method took well over an hour, and the hash method is still under a second! I knew the hash would be faster, but wow.

        $ perl 939870.pl 1000 Building arrays of 1000 with unique 4-char keys... done. Arrays have 1000 items each. Benchmarking... Rate arrays plus regex using a hash arrays plus regex 2.70/s -- -100% using a hash 1000000000000000/s 37000000000000000% -- + $ perl 939870.pl 5000 Building arrays of 5000 with unique 4-char keys... done. Arrays have 5000 items each. Benchmarking... s/iter arrays plus regex using a hash arrays plus regex 9.40 -- -100% using a hash 1.00e-02 93900% -- $ perl 939870.pl 10000 Building arrays of 10000 with unique 4-char keys... done. Arrays have 10000 items each. Benchmarking... s/iter arrays plus regex using a hash arrays plus regex 46.3 -- -100% using a hash 3.00e-02 154267% -- $ perl 939870.pl 100000 Building arrays of 100000 with unique 4-char keys... done. Arrays have 100000 items each. Benchmarking... s/iter arrays plus regex using a hash arrays plus regex 4110 -- -100% using a hash 0.330 1245345% --

        Aaron B.
        My Woefully Neglected Blog, where I occasionally mention Perl.

        Your node is in reply to ansh batra yet you seem to be replying to anasuya. Are you confusing two similar sounding user ids or are you implying they are the same person? anasuya is the OP with only a few posts whereas ansh batra has over 60 posts in four months. Maybe people are downvoting an eager beaver who is muddying the water with not great advice after good advice has already been given.

        I also don't like to downvote newcomers if they are making a good effort and actually listen to helpful suggestions.

      thanks a lot.
Re: correspondence between two arrays
by marto (Chancellor) on Nov 24, 2011 at 10:53 UTC
Re: correspondence between two arrays
by Marshall (Prior) on Nov 24, 2011 at 11:55 UTC
    #!usr/bin/perl -w use strict; use 5.10.0; my @file1 = qw(1GEG 3RU7 1BXS 2JG7 3QWU 2BHP 3ABU); my @file2 = qw(a.1.1.1 1GEG a.4.3.5 1BXS a.6.7.5 2JG7 a.8.7.9 2BHP a.3.2.1 3RU7); my %file2 = reverse @file2; #swaps pairs! Whoa! foreach my $fourLetterAcronym (@file1) { my $value = $file2{$fourLetterAcronym}; $value //= 'NIL'; #a Perl 5.10 feature print "$fourLetterAcronym => $value\n"; } __END__ Prints: 1GEG => a.1.1.1 3RU7 => a.3.2.1 1BXS => a.4.3.5 2JG7 => a.6.7.5 3QWU => NIL 2BHP => a.8.7.9 3ABU => NIL
    Update: $value //= 'NIL'; #a Perl 5.10 feature
    If $value is undefined or "false" in a logical sense set it to "NIL".

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://939831]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2014-04-19 22:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls