Debug help

ashnator has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I came up with a script for parsing 2 files but it is giving
me many warning and error messages. I am not getting howto
debug. Can somebody help ?

My File 1 looks like this:- 
155369268 300
169212695 200

My File 2 looks like this:-
>gi|155369268|ref|NM_802300917.1| Homo sapiens ewqeqwaspanin 19 (USUAN
+23), mRNA
CCTGCTCTCGATTCTAATGTGATGCGAACGCAGCATTTCAGGGACTGGATGAGGAGCTTACGGTTTTTT
ACAGAATCATCAATATCTTGGAAGAAAAAGAATGTTAAGAAATAACAAAACAATAATTATTAAGTACTTT
CTTAATCTCATTAATGGAGCTTTCTTGGTTCTTGGACTTTTATTCATGGGATTTGGTGCATGGCTCTTAT
TAGATAGAAATAATTTTTTAACAGCTTTTGATGAAAATAATCACTTCATAGTACCTATTTCTCAAATTTT
GATTGGAATGGGATCTTCTACTGTTCTTTTTTGTCTATTGGGTTATATAGGAATTCACAACGAAATCAGA
TGGCTCCTAATTGTGTATGCAGTATTGATAACATGGACCTTTGCTGTTCAGGTTGTACTTTCAGCATTCA
TCATCACAAAGAAAGAGGAGGTTCAGCAACTATGGCATGACAAAATTGATTTTGTCATTTCTGAGTATGG
ATCTAAAGATAAGCCTGAAGATATAACCAAGTGGACTATTCTGAATGCCTTACAGAAAACATTACAGTGT
TGTGGCCAACATAATTACACAGACTGGATAAAGAATAAGAACAAAGAAAATTCAGGACAGGTGCCATGTT
CTTGCACAAAGTCAACTTTAAGAAAATGGTTTTGTGATGAGCCACTGAATGCAACTTACCTTGAGGGTTG
TGAAAATAAAATCAGTGCATGGTATAATGTTAATGTGTTAACCTTAATCGGAATTAACTTTGGACTTTTA
ACTTCAGAGGTTTTCCAAGTCTCATTAACAGTTTGTTTCTTCAAAAACATCAAGAATATAATCCATGCAG
AAATGTGACCTTTGGATTTCAATTTGTTCAGAAGAAACCAGTTAATTCTTAAAAAATCACATTA
>gi|169212695|ref|XM_96216884.1| PREDICTED: Homo sapiens hypothetical 
+protein UJI1001326087 (YHC1001326768), mRNA
ATGTGTGTATATATATATATGCATATATATGTGTGTGTATATATATATACACATATATATGTGTGTGTAT
ATATATACACATATATATGTGTGTGTATATATATATACACATATATATGTGTGTATATATATATACACAC
ACATATATATGTATACATATACATGTATATGTATATATGTATACATATACATATATATGTATATATGTAT
ACATATACATATATACGTATATATACATATATGTATATATGTAAGTATACGTATATATACATATATGTAT
ATGTATGTACATATATATGCATGCACATATATATGTATTTATATATATGCATGTATATGTATATGCATGT
ACATATGGATGTATATATGCACGCATGTCTGTACATATGCATGTATGTATGTACATATAAATGTATATAT
ATGTATACATACATGTGTATATATACATGTATATGTATGTATACGTACATACATATGTATGTATACGTGT
ATGTATACATACATATGTATGTATGCGTACATACATATGTATACGTACATACATATGTATGCTTACACAC
ATGTATGCTTACACACATATGTATGTACGTGTACATACATATGTACACGTACATACATATGTACACGTAC
ATACATATGTTCCAGAGGAAGAAGAAACAAGTGTCTGGTGCCCAGAGACGACCAGATGCCCCACCAGTTC
TGATCCATAGGAGAATGATCGTTCCACATGGCCAACTCCATCCTCATGCAGCAATTCCTCCACAAGCACA
AGACAAGCTTGTCCTGATGTTCCTTGCCCTGGCAGATGTTCAGGACCTTCCTTTGATTCAACCCCTCCAC
CTAAATGGCCCAAGCTTTCGGGGCTGTCATTGTCTGTTTGTCATTCAAGGGCCCAAGCTGAAGAGGGGGT
TGTGGCCTAACCATGGTCGTGTTGTGCTGGACGTCACAGCAGAGGAGGAGGCGCAGAACAAAGGCTGC
[download]

The error or warning messages are :-

I am getting this error /warning:-
Substr outside of the string at line 42. <$fh> line 2.
Use of Uninitialized value in concatenation or string at line 42. <$fh
+> line 4.
Use of uninitialized value in hash element at line 34. <$fh> line 4. 
...................................
[download]

My code is :-

#!/usr/bin/perl

use strict;
use warnings;

my $qfn1 = "File1.txt";
my $qfn2 = "File2.txt";

my %positions;
{
   open(my $fh, '<', $qfn1)
      or die("Cannot open file \"$qfn1\": $!\n");

   while (<$fh>) {
      my ($key, $pos) = split /\s+/;
      $positions{$key} = $pos;
   }
}

{
   open(my $fh, '<', $qfn2)
      or die("Cannot open file \"$qfn2\": $!\n");

   for (;;) {
      defined( my $key = <$fh> )
         or last;
      defined( my $text = <$fh> )
         or last;
      chomp($key);
      chomp($text);

      $key = (split(/\|/,$key,3))[1];

      defined( my $pos = $positions{$key} ) ##line 34
         or next;

      my $index = rindex($text, "ATG", $pos);
      next if ( $index < 0 );

      $index += 3 while ( ($index + 3) < $pos);

      print "$key $pos " . substr($text, $index, 3) . "\n";  ##line 42

   }
}
[download]

20081224 Janitored by Corion: Restored content

Comment on Debug help Select or Download Code

Replies are listed 'Best First'.
Re: Debug help by ikegami (Patriarch) on Dec 21, 2008 at 06:39 UTC
`Substr outside of the string at line 42. <$fh> line 2.` `$index` (300) is greater than the the length of `$text` (69). `Use of Uninitialized value in concatenation or string at line 42. <$fh> line 4.` As a result of the previous problem, `subst` returned undefined. `Use of uninitialized value in hash element at line 34. <$fh> line 4.` `$key` is undefined, which means that `split` returned less than two values, which means `$key` didn't contain a "\|" before the split. I wonder what `$key` contained...	[reply] [d/l] [select]
Re^2: Debug help by ashnator (Sexton) on Dec 21, 2008 at 08:56 UTC
Actually I was doing for a single line ACTAGTACGTACGATCAGTAC but now there are mutiple lines like ACGTGACGTACGTACGTACGTA AGTACGATCACCCCCGTAGACG ACGTAGACATCAGATCGATAGT Howto take care of this in my code ?	[reply]
Re^3: Debug help by planetscape (Chancellor) on Dec 21, 2008 at 09:39 UTC
Actually I was doing for a single line ACTAGTACGTACGATCAGTAC but now there are mutiple lines like ACGTGACGTACGTACGTACGTA AGTACGATCACCCCCGTAGACG ACGTAGACATCAGATCGATAGT Howto take care of this in my code ? Wild guess... some sort of looping construct? HTH, planetscape	[reply]
Re^3: Debug help by AnomalousMonk (Archbishop) on Dec 21, 2008 at 15:22 UTC
... mutiple lines like ACGTGACGTACGTACGTACGTA AGTACGATCACCCCCGTAGACG ACGTAGACATCAGATCGATAGT ... From looking at your example data, my guess is that you are working with data records composed of multiple lines of data. Also from your example, it seems that each record begins with a '>' character. If these guesses are acurate, it may be useful to set the input record separator variable $/ (see perlvar) to '>', then read each record with a single statement and then remove all newlines from the record with a substitution. Note, however, that because the first record of the file begins with a '>', the first record read from the file (any stuff before the first '>') is junk and must be ignored. E.g. (untested): `$/ = '>'; # throw away first (junk) record defined(<$fh>) or ($! and die "reading: $!"); while (defined(my $record = <$fh>) or ($! and die "reading: $!")) { $record =~ s{ \n }{}xmsg; process($record); }` [download] Update: Since a `<$fh>` record read reads up to and including the input record separator string specified in `$/` (or to the end of the file), there should probably be a chomp in that while loop to remove the extraneous '>' character at the end of each record: `while (defined(my $record = <$fh>) or ($! and die "reading: $!")) { chomp $record; # remove $/ string $record =~ s{ \n }{}xmsg; process($record); }` [download]	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Howto correct my program -- help by MidLifeXis (Monsignor) on Dec 21, 2008 at 11:50 UTC
Update: Code and other data has been added to the original post since this was written, so it doesn't necessarily fit any more. What ikegami and planetscape are telling you is that this is a prime example of How (Not) To Ask A Question. Please provide what you have tried (code), exact error messages you get, input used, etc. Many monks (present company included) do not like to do someone else's work for them. Helping them with errors, sticky spots, or understanding is the target. Show that you have put some work into this, and you will receive better responses. --MidLifeXis	[reply]
Re: Howto correct my program -- help by graff (Chancellor) on Dec 21, 2008 at 23:12 UTC
I gather that the perl code was added to the OP after the previous replies were posted. Now that we have both the data and the code (and the intent is sort of clear, maybe), it looks like you've got the wrong logic for the task. Your "for" loop on the contents of File2 is taking only one line at a time, but it looks like the data is supposed to be handled one record at a time, where a record is the concatenation of lines of protein letters. You can't use an index value of 200 or 300 (characters) when your loop is only seeing 60 or 70 characters (one line) at a time. What you should be doing instead is reading all of File2 into another hash, also keyed by the ID numbers (just like the hash loaded from File1); then you can loop over the keys from File1, and do what needs to be done with the full-record data from File2. Since the actual specs for your task are not entirely clear to me, I have no way of knowing whether the following version of your code will do what needs to be done, but at least it will show you how you should be handling your second input file, and how you should be checking for (and reporting) conditions that go against expectations. #!/usr/bin/perl use strict; use warnings; my $qfn1 = "File1.txt"; my $qfn2 = "File2.txt"; my %positions; { open(my $fh, '<', $qfn1) or die("Cannot open file \"$qfn1\": $!\n"); while (<$fh>) { my ($key, $pos) = split /\s+/; $positions{$key} = $pos; } } my %sequences; { open(my $fh, '<', $qfn2) or die("Cannot open file \"$qfn2\": $!\n"); my $key; while (<$fh>) { if ( s/^>// ) { $key = ( split /\\|/ )[1]; } else { chomp; $sequences{$key} .= $_; } } } for my $key ( sort {$a<=>$b} keys %positions ) { if ( ! exists( $sequences{$key} )) { warn "KEY $key in File1 not found in File2\n"; next; } if ( length( $sequences{$key} ) < $positions{$key} ) { warn "KEY $key: File2 string length too short for File1 positi +on value\n"; next; } my $index = rindex( $sequences{$key}, "ATG", $positions{$key} ); if ( $index < 0 ) { warn sprintf( "KEY %s: No ATG in File2 string prior to positio +n %d\n", $key, $positions{$key} ); next; } $index += 3 while ( ($index + 3) < $positions{$key} ); print "$key $positions{$key} " . substr($sequences{$key}, $index, +3) . "\n"; } [download] For the data files shown in the OP, that version prints: `255369268 300 ACT 269212695 200 TAT` [download] How would you confirm that this is what it should be printing?	[reply] [d/l] [select]
Re^2: Howto correct my program -- help by ashnator (Sexton) on Dec 22, 2008 at 00:49 UTC
Thanks a lot. I promise I will be better from next time	[reply]


laziness, impatience, and hubris
	PerlMonks