in reply to Re^4: Comparing adjacent lines
in thread Comparing adjacent lines
Right you are, my apologies. The ex values that are not "ex1" (ex1 is produced at a mismatch in the comparison) revert to the default value of ex# (for example, ex18 or ex30) instead of counting up from 1. Basically, what I think is happening is ex1 is not passed on to the else part, since ex values get clobbered after one iteration of the loop. I've been trying variations of an ex variable external to the loop, but can't seem to figure it out.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^6: Comparing adjacent lines
by aaron_baugher (Curate) on Jul 13, 2012 at 02:02 UTC | |
I'm still having trouble understanding what you mean. Please post your code, along with about a dozen lines of your input data that demonstrate the problem, as well as what you would like the output from those lines to be. Aaron B. | [reply] |
by daccame (Initiate) on Jul 13, 2012 at 16:02 UTC | |
Remember I am new at Perl, so my approach is probably pretty brutish. Input:
Here's 25 lines of the output (I'll bold the lines where the problem appears and simplify the data for formatting by removing array items 10 and 12): chr1 815 4692 gene_id:"LOC653635"_exon1 chr1 4833 4901 gene_id:"LOC653635"_exon2 chr1 5659 5810 gene_id:"LOC653635"_exon3 chr1 6470 6628 gene_id:"LOC653635"_exon4 chr1 6717 6918 gene_id:"LOC653635"_exon5 chr1 7096 7231 gene_id:"LOC653635"_exon6 chr1 7469 7605 gene_id:"LOC653635"_exon7 chr1 7778 7924 gene_id:"LOC653635"_exon8 chr1 8131 8229 gene_id:"LOC653635"_exon9 chr1 8776 8938 gene_id:"LOC653635"_exon10 chr1 14601 14754 gene_id:"LOC653635"_exon11 chr1 19184 19919 gene_id:"LOC653635"_exon12 chr1 58954 59871 gene_id:"OR4F5"_exon1 chr1 77385 80096 gene_id:"LOC100132632"_exon1 chr1 110381 110795 gene_id:"LOC643670"_exon1 chr1 114643 114701 gene_id:"LOC729737"_exon1 chr1 118918 119086 gene_id:"LOC643670"_exon1 chr1 123237 130714 gene_id:"LOC643670"_exon18 chr1 123324 130714 gene_id:"LOC653340"_exon1 chr1 129326 129559 gene_id:"LOC729737"_exon1 chr1 129653 129710 gene_id:"LOC729737"_exon21 chr1 132810 132874 gene_id:"LOC729737"_exon22 chr1 133719 134341 gene_id:"LOC729737"_exon23 chr1 217633 218641 gene_id:"LOC728481"_exon1 chr1 313133 319752 gene_id:"LOC100132287"_exon1 The bolded lines should count up from 1, for example, "exon18" should read "exon2" and exon21, exon 22, exon23 should read exon2, exon3, exon4. | [reply] [d/l] [select] |
by aaron_baugher (Curate) on Jul 16, 2012 at 19:52 UTC | |
The problem here is that you're getting your $exon variable confused with $gtf[2]. You're using $exon to hold the counter which you increment on matching lines, but then you clobber it by assigning $gtf[2] to it after that, but then you don't use it after that. Also, if you're going to use a counter, you don't need to use a regex to increment the digit part of the field. Also, there's no need for an elsif when there are only two possibilities (eq and ne).
Aaron B. | [reply] [d/l] [select] |
by daccame (Initiate) on Jul 16, 2012 at 22:06 UTC |