Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Debug help

by ashnator (Sexton)
on Dec 21, 2008 at 05:43 UTC ( [id://731852]=perlquestion: print w/replies, xml ) Need Help??

ashnator has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,
I came up with a script for parsing 2 files but it is giving
me many warning and error messages. I am not getting howto
debug. Can somebody help ?
My File 1 looks like this:- 155369268 300 169212695 200 My File 2 looks like this:- >gi|155369268|ref|NM_802300917.1| Homo sapiens ewqeqwaspanin 19 (USUAN +23), mRNA CCTGCTCTCGATTCTAATGTGATGCGAACGCAGCATTTCAGGGACTGGATGAGGAGCTTACGGTTTTTT ACAGAATCATCAATATCTTGGAAGAAAAAGAATGTTAAGAAATAACAAAACAATAATTATTAAGTACTTT CTTAATCTCATTAATGGAGCTTTCTTGGTTCTTGGACTTTTATTCATGGGATTTGGTGCATGGCTCTTAT TAGATAGAAATAATTTTTTAACAGCTTTTGATGAAAATAATCACTTCATAGTACCTATTTCTCAAATTTT GATTGGAATGGGATCTTCTACTGTTCTTTTTTGTCTATTGGGTTATATAGGAATTCACAACGAAATCAGA TGGCTCCTAATTGTGTATGCAGTATTGATAACATGGACCTTTGCTGTTCAGGTTGTACTTTCAGCATTCA TCATCACAAAGAAAGAGGAGGTTCAGCAACTATGGCATGACAAAATTGATTTTGTCATTTCTGAGTATGG ATCTAAAGATAAGCCTGAAGATATAACCAAGTGGACTATTCTGAATGCCTTACAGAAAACATTACAGTGT TGTGGCCAACATAATTACACAGACTGGATAAAGAATAAGAACAAAGAAAATTCAGGACAGGTGCCATGTT CTTGCACAAAGTCAACTTTAAGAAAATGGTTTTGTGATGAGCCACTGAATGCAACTTACCTTGAGGGTTG TGAAAATAAAATCAGTGCATGGTATAATGTTAATGTGTTAACCTTAATCGGAATTAACTTTGGACTTTTA ACTTCAGAGGTTTTCCAAGTCTCATTAACAGTTTGTTTCTTCAAAAACATCAAGAATATAATCCATGCAG AAATGTGACCTTTGGATTTCAATTTGTTCAGAAGAAACCAGTTAATTCTTAAAAAATCACATTA >gi|169212695|ref|XM_96216884.1| PREDICTED: Homo sapiens hypothetical +protein UJI1001326087 (YHC1001326768), mRNA ATGTGTGTATATATATATATGCATATATATGTGTGTGTATATATATATACACATATATATGTGTGTGTAT ATATATACACATATATATGTGTGTGTATATATATATACACATATATATGTGTGTATATATATATACACAC ACATATATATGTATACATATACATGTATATGTATATATGTATACATATACATATATATGTATATATGTAT ACATATACATATATACGTATATATACATATATGTATATATGTAAGTATACGTATATATACATATATGTAT ATGTATGTACATATATATGCATGCACATATATATGTATTTATATATATGCATGTATATGTATATGCATGT ACATATGGATGTATATATGCACGCATGTCTGTACATATGCATGTATGTATGTACATATAAATGTATATAT ATGTATACATACATGTGTATATATACATGTATATGTATGTATACGTACATACATATGTATGTATACGTGT ATGTATACATACATATGTATGTATGCGTACATACATATGTATACGTACATACATATGTATGCTTACACAC ATGTATGCTTACACACATATGTATGTACGTGTACATACATATGTACACGTACATACATATGTACACGTAC ATACATATGTTCCAGAGGAAGAAGAAACAAGTGTCTGGTGCCCAGAGACGACCAGATGCCCCACCAGTTC TGATCCATAGGAGAATGATCGTTCCACATGGCCAACTCCATCCTCATGCAGCAATTCCTCCACAAGCACA AGACAAGCTTGTCCTGATGTTCCTTGCCCTGGCAGATGTTCAGGACCTTCCTTTGATTCAACCCCTCCAC CTAAATGGCCCAAGCTTTCGGGGCTGTCATTGTCTGTTTGTCATTCAAGGGCCCAAGCTGAAGAGGGGGT TGTGGCCTAACCATGGTCGTGTTGTGCTGGACGTCACAGCAGAGGAGGAGGCGCAGAACAAAGGCTGC
The error or warning messages are :-
I am getting this error /warning:- Substr outside of the string at line 42. <$fh> line 2. Use of Uninitialized value in concatenation or string at line 42. <$fh +> line 4. Use of uninitialized value in hash element at line 34. <$fh> line 4. ...................................
My code is :-
#!/usr/bin/perl use strict; use warnings; my $qfn1 = "File1.txt"; my $qfn2 = "File2.txt"; my %positions; { open(my $fh, '<', $qfn1) or die("Cannot open file \"$qfn1\": $!\n"); while (<$fh>) { my ($key, $pos) = split /\s+/; $positions{$key} = $pos; } } { open(my $fh, '<', $qfn2) or die("Cannot open file \"$qfn2\": $!\n"); for (;;) { defined( my $key = <$fh> ) or last; defined( my $text = <$fh> ) or last; chomp($key); chomp($text); $key = (split(/\|/,$key,3))[1]; defined( my $pos = $positions{$key} ) ##line 34 or next; my $index = rindex($text, "ATG", $pos); next if ( $index < 0 ); $index += 3 while ( ($index + 3) < $pos); print "$key $pos " . substr($text, $index, 3) . "\n"; ##line 42 } }

20081224 Janitored by Corion: Restored content

Replies are listed 'Best First'.
Re: Debug help
by ikegami (Patriarch) on Dec 21, 2008 at 06:39 UTC

    Substr outside of the string at line 42. <$fh> line 2.

    $index (300) is greater than the the length of $text (69).

    Use of Uninitialized value in concatenation or string at line 42. <$fh> line 4.

    As a result of the previous problem, subst returned undefined.

    Use of uninitialized value in hash element at line 34. <$fh> line 4.

    $key is undefined, which means that split returned less than two values, which means $key didn't contain a "|" before the split. I wonder what $key contained...

      Actually I was doing for a single line ACTAGTACGTACGATCAGTAC but now there are mutiple lines like ACGTGACGTACGTACGTACGTA AGTACGATCACCCCCGTAGACG ACGTAGACATCAGATCGATAGT Howto take care of this in my code ?
        Actually I was doing for a single line ACTAGTACGTACGATCAGTAC but now there are mutiple lines like ACGTGACGTACGTACGTACGTA AGTACGATCACCCCCGTAGACG ACGTAGACATCAGATCGATAGT Howto take care of this in my code ?

        Wild guess... some sort of looping construct?

        HTH,

        planetscape
        ... mutiple lines like ACGTGACGTACGTACGTACGTA AGTACGATCACCCCCGTAGACG ACGTAGACATCAGATCGATAGT ...
        From looking at your example data, my guess is that you are working with data records composed of multiple lines of data. Also from your example, it seems that each record begins with a '>' character.

        If these guesses are acurate, it may be useful to set the input record separator variable $/ (see perlvar) to '>', then read each record with a single statement and then remove all newlines from the record with a substitution.

        Note, however, that because the first record of the file begins with a '>', the first record read from the file (any stuff before the first '>') is junk and must be ignored.

        E.g. (untested):

        $/ = '>'; # throw away first (junk) record defined(<$fh>) or ($! and die "reading: $!"); while (defined(my $record = <$fh>) or ($! and die "reading: $!")) { $record =~ s{ \n }{}xmsg; process($record); }
        Update: Since a <$fh> record read reads up to and including the input record separator string specified in $/ (or to the end of the file), there should probably be a chomp in that while loop to remove the extraneous '>' character at the end of each record:
        while (defined(my $record = <$fh>) or ($! and die "reading: $!")) { chomp $record; # remove $/ string $record =~ s{ \n }{}xmsg; process($record); }
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Howto correct my program -- help
by MidLifeXis (Monsignor) on Dec 21, 2008 at 11:50 UTC

    Update: Code and other data has been added to the original post since this was written, so it doesn't necessarily fit any more.

    What ikegami and planetscape are telling you is that this is a prime example of How (Not) To Ask A Question. Please provide what you have tried (code), exact error messages you get, input used, etc. Many monks (present company included) do not like to do someone else's work for them. Helping them with errors, sticky spots, or understanding is the target. Show that you have put some work into this, and you will receive better responses.

    --MidLifeXis

Re: Howto correct my program -- help
by graff (Chancellor) on Dec 21, 2008 at 23:12 UTC
    I gather that the perl code was added to the OP after the previous replies were posted. Now that we have both the data and the code (and the intent is sort of clear, maybe), it looks like you've got the wrong logic for the task.

    Your "for" loop on the contents of File2 is taking only one line at a time, but it looks like the data is supposed to be handled one record at a time, where a record is the concatenation of lines of protein letters. You can't use an index value of 200 or 300 (characters) when your loop is only seeing 60 or 70 characters (one line) at a time.

    What you should be doing instead is reading all of File2 into another hash, also keyed by the ID numbers (just like the hash loaded from File1); then you can loop over the keys from File1, and do what needs to be done with the full-record data from File2.

    Since the actual specs for your task are not entirely clear to me, I have no way of knowing whether the following version of your code will do what needs to be done, but at least it will show you how you should be handling your second input file, and how you should be checking for (and reporting) conditions that go against expectations.

    #!/usr/bin/perl use strict; use warnings; my $qfn1 = "File1.txt"; my $qfn2 = "File2.txt"; my %positions; { open(my $fh, '<', $qfn1) or die("Cannot open file \"$qfn1\": $!\n"); while (<$fh>) { my ($key, $pos) = split /\s+/; $positions{$key} = $pos; } } my %sequences; { open(my $fh, '<', $qfn2) or die("Cannot open file \"$qfn2\": $!\n"); my $key; while (<$fh>) { if ( s/^>// ) { $key = ( split /\|/ )[1]; } else { chomp; $sequences{$key} .= $_; } } } for my $key ( sort {$a<=>$b} keys %positions ) { if ( ! exists( $sequences{$key} )) { warn "KEY $key in File1 not found in File2\n"; next; } if ( length( $sequences{$key} ) < $positions{$key} ) { warn "KEY $key: File2 string length too short for File1 positi +on value\n"; next; } my $index = rindex( $sequences{$key}, "ATG", $positions{$key} ); if ( $index < 0 ) { warn sprintf( "KEY %s: No ATG in File2 string prior to positio +n %d\n", $key, $positions{$key} ); next; } $index += 3 while ( ($index + 3) < $positions{$key} ); print "$key $positions{$key} " . substr($sequences{$key}, $index, +3) . "\n"; }
    For the data files shown in the OP, that version prints:
    255369268 300 ACT 269212695 200 TAT
    How would you confirm that this is what it should be printing?
      Thanks a lot. I promise I will be better from next time

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://731852]
Approved by McDarren
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-23 17:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found