compare text files

linseyr has asked for the wisdom of the Perl Monks concerning the following question:

Hi, This is my first day working with perl, so I really dont know anything. It took me whole day to come up with some code, but still it doesn't work :( I have two text files which are tab delimited like this:

file 1 (test)
chr1    30
chr2    20
chr2    50
chr3    80

file 2 (reference)
chr1    40
chr1    50
chr2    60
chr2    80
chr3    100
[download]

I want to compare file1 which is the testfile agains file2 which is the reference file. When column 1 of both files is the same, I want to get the difference between the two files for column 2, and it must return the smallest value for that particular chr. So the output should look like this:

output
chr1    10
chr2    40
chr2    10
chr3    20
[download]

My code looks like this now:

 #!/usr/bin/env perl
use strict;
use warnings;

my ( @cols, $p1, $p2, $p3, $p4, @sec, @cols2 );

@ARGV or die "No input file specified";

open my $first , '<',$ARGV[0] or die "Unable to open input file: $!";
open my $second,'<', $ARGV[1] or die "Unable to open input file: $!";

print scalar <$first>;
<$second>; #...throw away first line...
while (<$first>) {
    @cols = split /\s+/;
    $p1   = $cols[0];
    $p2   = $cols[1];
    #print $p1;
    while (<$second>) {
        @cols2 = split /\s+/;
        $p3   = $cols2[0];
        $p4   = $cols2[1];
      
        if ($p3 eq $p1) {
            print "yes";
        }
    }
}
[download]

But this doesn't work.. Could somebody please help me? Thanks!

Comment on compare text files Select or Download Code

Replies are listed 'Best First'.
Re: compare text files by roboticus (Chancellor) on Sep 12, 2012 at 01:11 UTC
linseyr: You've got two nested loops, and you're reading the entire second file before reading the next line of the first file. But since the second file has been read in its entirety, the second loop will fail from that point on. There are a good few approaches you could use: Read both files into an array and iterate over the arrays. (Easy, but "inelegant".) Sort both input files by the matching column and then use a single loop, consuming the line having the lowest key value. (A little tricky.) Rewind/reopen the second file each iteration. (Ugly.) Read both files into hashes using your key column as the hash key, and then process the hashes. (Usually a good way, but your data has duplicate keys, so it wouldn't be very helpful in this case.) In this case, I'm thinking you'll probably be best off with the first suggestion. Update: You may want to put a few print statements in your program so you can see what's happening. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re: compare text files by Rudolf (Pilgrim) on Sep 12, 2012 at 01:37 UTC
Hi linseyr, this is a script I came up with on the fly I hope it is self explanitory. The only messy part is using a huge literal because I was'nt smart enough to figure out something else.. or I was too lazy? But anyway I use file handles instead of references to them, gather the data in arrays first and then do my comparing using a variable $lesser_value to keep track of if I have the least possible value for the chrx. Best wishes, and feel free to ask any questions! hope this helps. -Rudolf use v5.14; open(TEST,'<',"test.txt") or die $!; my @test_data = <TEST>; close(TEST); open(REFERENCE,'<',"reference.txt") or die $!; my @reference_data = <REFERENCE>; close(REFERENCE); my @results; for my $line(@test_data){ my $lesser_val = 9999999999999999; my($chr,$val) = split(/ /,$line); for my $ref_line(@reference_data){ my($ref_chr,$ref_val) = split(/ /,$ref_line); if($ref_chr eq $chr){ $lesser_val = ($ref_val-$val) if ($ref_val-$val < $lesser_val +); } } push(@results,"$chr $lesser_val"); } say foreach @results; [download]	[reply] [d/l]
Re^2: compare text files by linseyr (Acolyte) on Sep 12, 2012 at 03:13 UTC
Hi Rudolf, Thank you so much for your answer! I have two questions though. 1. What does the push and say mean? I though that say is something like print, but I don't get any output now. Is it possible to print it to a file? 2. When I print $lesser_val it prints it multiple times. Should it be outside a loop or something? Thanks!	[reply]
Re^3: compare text files by Rudolf (Pilgrim) on Sep 12, 2012 at 06:43 UTC
Hey, no worries. The code does work fine on my computer; however push(@array,$item); "pushes" a flat variable onto the end of the list. So its just adding it to the list. At the end of the program outside of all loops yes I used: `say foreach @results;` [download] say "hello world"; is like System.out.println() in java, it prints the output with a newline. So when I type foreach @results it just prints out each result with a newline on the end. Now, in order to use some keywords in perl you must tell it you are using a specific version first, I use v5.14; at the top and you can use say after... 5.10 I think. But you can also write the printout like so if you wanted to print to a results file!: `use v5.14; open(RESULTS,'>','results.txt') or die $!; foreach(@results){ print RESULTS "$_\n"; } close(RESULTS);` [download] The printing should only come after all the computing.. it should not be in any loops. All results should save to the array first and then you can use it at the end. Dont hessitate with any more questions :) good luck.	[reply] [d/l] [select]
Re: compare text files by kcott (Archbishop) on Sep 12, 2012 at 10:28 UTC
G'day linseyr, Welcome to the monastery. This works with the data you've provided. `#!/usr/bin/env perl use strict; use warnings; my %least_ref; open my $reference, '<', $ARGV[0] or die "Can't open reference ($ARGV[ +0]): $!"; while (<$reference>) { my ($key, $val) = split; if (not exists $least_ref{$key} or $val < $least_ref{$key}) { $least_ref{$key} = $val; } } close $reference; open my $testfile, '<', $ARGV[1] or die "Can't open testfile ($ARGV[1] +): $!"; while (<$testfile>) { my ($key, $val) = split; print "$key\t", $least_ref{$key} - $val, "\n"; } close $testfile;` [download] Input files: `$ cat > pm_least_diff.ref chr1 40 chr1 50 chr2 60 chr2 80 chr3 100` [download] `$ cat > pm_least_diff.test chr1 30 chr2 20 chr2 50 chr3 80` [download] Script output: `$ pm_least_diff.pl pm_least_diff.ref pm_least_diff.test chr1 10 chr2 40 chr2 10 chr3 20` [download] -- Ken	[reply] [d/l] [select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks