Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

compare text files

by linseyr (Acolyte)
on Sep 12, 2012 at 01:01 UTC ( [id://993100]=perlquestion: print w/replies, xml ) Need Help??

linseyr has asked for the wisdom of the Perl Monks concerning the following question:

Hi, This is my first day working with perl, so I really dont know anything. It took me whole day to come up with some code, but still it doesn't work :( I have two text files which are tab delimited like this:
file 1 (test) chr1 30 chr2 20 chr2 50 chr3 80 file 2 (reference) chr1 40 chr1 50 chr2 60 chr2 80 chr3 100
I want to compare file1 which is the testfile agains file2 which is the reference file. When column 1 of both files is the same, I want to get the difference between the two files for column 2, and it must return the smallest value for that particular chr. So the output should look like this:
output chr1 10 chr2 40 chr2 10 chr3 20
My code looks like this now:
#!/usr/bin/env perl use strict; use warnings; my ( @cols, $p1, $p2, $p3, $p4, @sec, @cols2 ); @ARGV or die "No input file specified"; open my $first , '<',$ARGV[0] or die "Unable to open input file: $!"; open my $second,'<', $ARGV[1] or die "Unable to open input file: $!"; print scalar <$first>; <$second>; #...throw away first line... while (<$first>) { @cols = split /\s+/; $p1 = $cols[0]; $p2 = $cols[1]; #print $p1; while (<$second>) { @cols2 = split /\s+/; $p3 = $cols2[0]; $p4 = $cols2[1]; if ($p3 eq $p1) { print "yes"; } } }
But this doesn't work.. Could somebody please help me? Thanks!

Replies are listed 'Best First'.
Re: compare text files
by roboticus (Chancellor) on Sep 12, 2012 at 01:11 UTC

    linseyr:

    You've got two nested loops, and you're reading the *entire* second file before reading the next line of the first file. But since the second file has been read in its entirety, the second loop will fail from that point on.

    There are a good few approaches you could use:

    • Read both files into an array and iterate over the arrays. (Easy, but "inelegant".)
    • Sort both input files by the matching column and then use a single loop, consuming the line having the lowest key value. (A *little* tricky.)
    • Rewind/reopen the second file each iteration. (Ugly.)
    • Read both files into hashes using your key column as the hash key, and then process the hashes. (Usually a good way, but your data has duplicate keys, so it wouldn't be very helpful in this case.)

    In this case, I'm thinking you'll probably be best off with the first suggestion.

    Update: You may want to put a few print statements in your program so you can see what's happening.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: compare text files
by Rudolf (Pilgrim) on Sep 12, 2012 at 01:37 UTC

    Hi linseyr, this is a script I came up with on the fly I hope it is self explanitory. The only messy part is using a huge literal because I was'nt smart enough to figure out something else.. or I was too lazy? But anyway I use file handles instead of references to them, gather the data in arrays first and then do my comparing using a variable $lesser_value to keep track of if I have the least possible value for the chrx. Best wishes, and feel free to ask any questions! hope this helps. -Rudolf

    use v5.14; open(TEST,'<',"test.txt") or die $!; my @test_data = <TEST>; close(TEST); open(REFERENCE,'<',"reference.txt") or die $!; my @reference_data = <REFERENCE>; close(REFERENCE); my @results; for my $line(@test_data){ my $lesser_val = 9999999999999999; my($chr,$val) = split(/ /,$line); for my $ref_line(@reference_data){ my($ref_chr,$ref_val) = split(/ /,$ref_line); if($ref_chr eq $chr){ $lesser_val = ($ref_val-$val) if ($ref_val-$val < $lesser_val +); } } push(@results,"$chr $lesser_val"); } say foreach @results;
      Hi Rudolf, Thank you so much for your answer! I have two questions though. 1. What does the push and say mean? I though that say is something like print, but I don't get any output now. Is it possible to print it to a file? 2. When I print $lesser_val it prints it multiple times. Should it be outside a loop or something? Thanks!

        Hey, no worries. The code does work fine on my computer; however push(@array,$item); "pushes" a flat variable onto the end of the list. So its just adding it to the list. At the end of the program outside of all loops yes I used:

        say foreach @results;

        say "hello world"; is like System.out.println() in java, it prints the output with a newline. So when I type foreach @results it just prints out each result with a newline on the end.

        Now, in order to use some keywords in perl you must tell it you are using a specific version first, I use v5.14; at the top and you can use say after... 5.10 I think. But you can also write the printout like so if you wanted to print to a results file!:

        use v5.14; open(RESULTS,'>','results.txt') or die $!; foreach(@results){ print RESULTS "$_\n"; } close(RESULTS);

        The printing should only come after all the computing.. it should not be in any loops. All results should save to the array first and then you can use it at the end. Dont hessitate with any more questions :) good luck.

Re: compare text files
by kcott (Archbishop) on Sep 12, 2012 at 10:28 UTC

    G'day linseyr,

    Welcome to the monastery.

    This works with the data you've provided.

    #!/usr/bin/env perl use strict; use warnings; my %least_ref; open my $reference, '<', $ARGV[0] or die "Can't open reference ($ARGV[ +0]): $!"; while (<$reference>) { my ($key, $val) = split; if (not exists $least_ref{$key} or $val < $least_ref{$key}) { $least_ref{$key} = $val; } } close $reference; open my $testfile, '<', $ARGV[1] or die "Can't open testfile ($ARGV[1] +): $!"; while (<$testfile>) { my ($key, $val) = split; print "$key\t", $least_ref{$key} - $val, "\n"; } close $testfile;

    Input files:

    $ cat > pm_least_diff.ref chr1 40 chr1 50 chr2 60 chr2 80 chr3 100
    $ cat > pm_least_diff.test chr1 30 chr2 20 chr2 50 chr3 80

    Script output:

    $ pm_least_diff.pl pm_least_diff.ref pm_least_diff.test chr1 10 chr2 40 chr2 10 chr3 20

    -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://993100]
Approved by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-04-26 00:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found