faozhi:
Others are assisting you with the question you asked. I'm going to fly off on a couple of tangents, instead, about basic coding practices.
Commenting
Generally, if you write your code clearly, the need for comments is greatly reduced. For example, in this section of your code, the variable names are clearly file names, so the comment is redundant.
#declare all filenames
$filename1 = 'one.txt';
$filename2 = 'two.txt';
$filename3 = 'three.txt';
In this section, the comment is pretty much a duplication of what the code says, so it's not helpful.
#open text file 1
open (FILE1, $filename1) or die "Unable to open $filename1 because $!\
+n";
If I felt a comment necessary, I would instead have stated in my comment the effect of what I was doing, like this:
# Store the contents of the first file into $hash{col1_col2}
open (FILE1, $filename1) or die "Unable to open $filename1 because $!\
+n";
while ($line = <FILE1>) {
chomp ($line);
($chrX, $chrpos, $value1, $value2) = split (/\t/, $line);
$key1 = join ("_", $chrX, $chrpos);
$hash{$key1}++;
};
close FILE1;
Indentation
The use of indentation is supposed to clarify the structure of the code, so you can see which statements are bundled together, and to make it simple to tell which code is associated with which control-flow structure. By having all your1 control-flow statements aligned to the left margin, you make it more difficult to see the logical structure of the program. You should change from this:
while ($line = <FILE2>) {
chomp ($line);
($chrX, $chrpos, $value11, $value22) = split (/\t/, $line);
$key2 = join ("_", $chrX, $chrpos);
if (exists $hash{$key2} > 0) {
$hash{key2} = $value11 + $value1;
$hash{key2} = $value22 + $value2;
$hash{key2}++
}
};
to this:
while ($line = <FILE2>) {
chomp ($line);
($chrX, $chrpos, $value11, $value22) = split (/\t/, $line);
$key2 = join ("_", $chrX, $chrpos);
if (exists $hash{$key2} > 0) {
$hash{key2} = $value11 + $value1;
$hash{key2} = $value22 + $value2;
$hash{key2}++
}
};
Obviously, it's not a problem in this particular program, as you don't have anything complicated going on. But when you have a page full of code with a lot of flow-control going on, you're going to find it difficult to maintain your code.
Semicolons
While not harmful, you're putting extra semicolons in your code (specifically at the end of your while loops. It doesn't hurt anything in this case, but since they're unexpected, it *does* make the code slightly harder to read.
Subroutines
When you start writing the same code repeatedly, you should start thinking about how you can use subroutines to simplify your task. For example, this code:
open (FILE2, $filename2) or die "Unable to open $filename2 because $!\
+n";
while ($line = <FILE2>) {
chomp ($line);
($chrX, $chrpos, $value11, $value22) = split (/\t/, $line);
$key2 = join ("_", $chrX, $chrpos);
if (exists $hash{$key2} > 0) {
$hash{key2} = $value11 + $value1;
$hash{key2} = $value22 + $value2;
$hash{key2}++
}
};
close FILE2;
is nearly identical to the code you use to process file 3. So you should think about using a subroutine to process the files. For example, you could create a subroutine like this:
sub process_file {
my $filename = shift or die "Missing filename!";
open (FILE, $filename) or die "Unable to open $filename because $!
+\n";
while ($line = <FILE2>) {
chomp ($line);
($chrX, $chrpos, $value11, $value22) = split (/\t/, $line);
$key2 = join ("_", $chrX, $chrpos);
if (exists $hash{$key2} > 0) {
$hash{key2} = $value11 + $value1;
$hash{key2} = $value22 + $value2;
$hash{key2}++
}
}
close FILE;
}
then, in your code, you can process your second and third files like2:
process_file($filename2);
process_file($filename3);
Due to correctness issues in your code, I can't tell whether it's possible or not, but frequently in programs like this, you can use the same subroutine for your first file as well--the if statements will degenerate to a single case, and only be used for the successive files. Once you clean up the other bits of your code, you might be able to take advantage of it.
use strict; use warnings;
I haven't checked to see whether the strict or warnings modules would help in this case or not, but it would be to your advantage to put them into your program before anything else. They will catch many programming errors for you. You may even find "use diagnostics" helpful. (I generally only put in "use diagnostics" when I don't understand what the error message is trying to tell me.)
I hope you find some of this useful.
...roboticus
Updates: (marked by superscripts in the above text)
- Changed 'you' to 'your'
- In the next code snippet, I corrected the second line, changing 'process-file' to 'process_file'
|