genome has asked for the wisdom of the Perl Monks concerning the following question:

I have two input files :-

File 1

CTSC chr11 - 650 E RAB38 chr11 - 87883123 E + 12 2 INTRACHR-SS-OGO-0GAP inframe-shift file 2 chr11 602 63889087 HWI-ST216_106:3:67:14628:181851 - chr11 613 69889087 HWI-ST216_106:3:24:16406:176388 + chr11 614 80889087 HWI-ST216_106:3:21:13731:105239 + chr11 643 94888888 HWI-ST216_111:5:22:3149:116167 + chr11 678 98889079 HWI-ST216_106:3:5:18058:57952 + chr11 612 108888887 HWI-ST216_106:3:8:5578:44855 + chr5 612 63889087 HWI-ST216_106:3:67:14628:181851 - chr3 88033200 69889087 HWI-ST216_106:3:24:16406:176388 + chr2 88033345 80889087 HWI-ST216_106:3:21:13731:105239 + chr1 88033376 94888888 HWI-ST216_111:5:22:3149:116167 + chr6 88034000 98889079 HWI-ST216_106:3:5:18058:57952 + chr7 88034123 108888887 HWI-ST216_106:3:8:5578:44855 +
I want to count the number of entries in the second column of second file within 500+ (up) and 500- (down) to the number medntioned in the second column of first file.

For this, I made an script, but does not giving the desired result:-

#!/usr/bin/perl -w $file1=$ARGV[0]; #First file $file2=$ARGV[1]; #second file open(TR,$file1); while ($line1=<TR>) { chomp($line1); @ar1 = split(/\t/,$line1); chomp($ar1[1]);chomp($ar1[3]); $up = $ar1[3]-500; $dn = $ar1[3]+500; open(SC,$file2); while ($line2=<SC>) { chomp($line2); @array2 = split (/\t/, $line2); if ($ar1[1] eq $array2[0]) { for ($mm=$up;$mm<=$dn;$mm+=100) { $gt = $up+100; $c ='0'; if (($array2[1]> $up) && ($array2[1] < $gt)) { print "$up\t$gt\t$array2[1]\n"; $c++; } $up=$up+50; } } } }

Result produced is

550 650 602 <p> 600 700 602 <p>
But where is the count of 612,613, 614 etc.. these numbers are in the rage.

Replies are listed 'Best First'.
Re: Sliding window perl program
by GotToBTru (Prior) on Oct 20, 2015 at 19:51 UTC

    You've been here before. Using <c> tags is not that hard.

    Update: You are incrementing $up within the for loop within the while loop, so you aren't comparing the range you think you are after the first $file2. You probably want to set the value of $up before the for loop.

    Further Update: much more readable! I see my guesses about the code were correct. If you can properly indent it as well, the structure of the code will become clearer, and that's to your benefit; it will be easier to see why it does not work.

    BTW: variable $c has no apparent purpose. You set it to zero, and then conditionally increment it, only to zero it again next range.

    Dum Spiro Spero
      Please have a look now.

        Just for fun, I did. Here's what happened:

        $ perl -cw 1145471.pl Unquoted string "p" may clash with future reserved word at 1145471.pl +line 6. Scalar found where operator expected at 1145471.pl line 11, near "$up" (Missing semicolon on previous line?) Scalar found where operator expected at 1145471.pl line 12, near "$dn" (Missing semicolon on previous line?) Scalar found where operator expected at 1145471.pl line 23, near "$c" (Missing semicolon on previous line?) Unquoted string "p" may clash with future reserved word at 1145471.pl +line 24. Scalar found where operator expected at 1145471.pl line 31, near "$up" (Missing semicolon on previous line?) syntax error at 1145471.pl line 6, near "p>" syntax error at 1145471.pl line 10, near "chomp" syntax error at 1145471.pl line 11, near "$up " syntax error at 1145471.pl line 15, near ") {" syntax error at 1145471.pl line 19, near ") {" syntax error at 1145471.pl line 21, near "100) " syntax error at 1145471.pl line 24, near ") <" syntax error at 1145471.pl line 29, near "}" syntax error at 1145471.pl line 33, near "}" 1145471.pl had compilation errors.

        Do you see the problem?

Re: Sliding window perl program
by Laurent_R (Canon) on Oct 20, 2015 at 20:57 UTC
    Untested code illustrating the algorithm described in my post above:
    # opening file 1 # ... my %hash; while (my $line1=<TR>) { chomp($line1); my @ar = split(/\t/,$line1); $hash{$ar[1]} = $ar[3]; } close TR; open my $SC, "<", $file2 or die "Error blah blah... $!"; while (my $line2 = <$SC>) { my ($id, $val) = split /\t/, $line2; my $val_file1 = $hash{$id}; if ( $val > $val_file1 - $margin and $val < $val_file1 + $margin) +{ # print out something } } close $SC;
    BTW, I basically kept your syntax almost unchanged for the first part and tried to improve it in accordance to good practices in the second part. Take a look at the differences.
      Hi, This program is for sliding window. I want to count the number of entries of file 2 , lies between the variable range (100 size window with 50 size slid) of sliding window from +500 to the -500 of 650. That is because I put a for loop of 100 increment and then tries to increase the $up number 50. i.e. 150-250

      200-300

      250-350

      ...

      ...

      ...

      1100-1150

      I hope u understand the question.

        Well, since you insist on sliding windows, it appears that I missed your requirement, sorry about that. But I still really don't understand what you want.

        Perhaps a more detailed example of your input and desired output would help.

Re: Sliding window perl program
by Laurent_R (Canon) on Oct 20, 2015 at 20:37 UTC
    Your file 1 has only one line. Is this going to be the case with your real data? If yes, you only need to pick up the lines of file 2 where column two falls between the range defined by file 1.

    But if file 1 has several lines, then you need to explain how to do the match between the two files.

    From looking at your data, maybe you want this: for each line in file 1, check the ID (e.g. chr11) in the second field, pick up the pivot value in the 4th field (650), and grab in file 2 all lines whose ID is chr11 and whose second field is within the range (650-500..650+500).

    If this is what you want, I suggest that you should read the full file 1 and store ID/4th field of each line in a hash. Close File 1. Read file 2, for each line in file 2, check the ID, lookup the pivot value for this ID in the hash, and check if field 2 falls within the range.

Re: Sliding window perl program
by LanX (Cardinal) on Oct 20, 2015 at 19:51 UTC
      Its not the question. I have corrected the code. Please have a look now.

        I'm afraid your code corrections are incomplete:

        • spurious "<c>" and "<p>" tags all over the place inside the main code block
        • badly formatted code, e.g., inconsistent indentation (perhaps caused by the spurious tags)