Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Parsing problem

by MB123 (Initiate)
on Nov 07, 2012 at 21:48 UTC ( #1002755=perlquestion: print w/ replies, xml ) Need Help??
MB123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I have this large text file in the format shown below -
ID SNP FT SNP 433 FT /note="refAllele: T SNPstrains: 7083_1#5=C 7414_8#8=C 7480_8#4 +9=C " FT /colour=1 FT SNP 442 FT /note="refAllele: T SNPstrains: 7065_8#2=C 7065_8#94=C 7083_1# +2=C 7083_1#3=C 7083_1#41=C 7083_1#42=C 7083_1#43=C " FT /colour=1 FT SNP 460 FT /note="refAllele: T SNPstrains: 7564_8#14=C " FT /colour=1 FT SNP 703 FT /note="refAllele: G SNPstrains: 7521_5#39=A (non-synonymous) ( +AA Ala->Thr) " FT /colour=2 FT SNP 937 FT /note="refAllele: G SNPstrains: 7414_8#30=T (non-synonymous) ( +AA Val->Leu) " FT /colour=2 FT SNP 1269 FT /note="refAllele: G SNPstrains: 7480_7#22=A (synonymous) 7480_ +7#62=A (synonymous) " FT /colour=3 FT SNP 1804 FT /note="refAllele: T SNPstrains: 7414_7#66=A (non-synonymous) ( +AA Ser->Thr) 7414_8#44=A (non-synonymous) (AA Ser->Thr) 7521_6#54=A ( +non-synonymous) (AA Ser->Thr) " FT /colour=2 etc etc...
And this code-
use strict; use warnings; use feature qw(say); my $file = "BSAC.pl"; my %cod = ( 1 => "red", 2 => "non", 3 => "green" ); open my $in, "<", "$file"; open my $out, ">", "output.txt"; say $out "Coordinate No of Strains AA Change"; my $SNP; my $count; my $change; while ( my $line = <$in> ) { chomp $line; say qq(DEBUG: Line = "$line"); if ( $line =~ /^FT\s+SNP\s+(\d+)/ ){ $SNP = $1; say qq(\$SNP = $1;); } elsif ( $line =~ /^FT\s+\/note="(.*)"/) { my $note = $1; say qq(my \$note = $1); $count = ($note =~ tr/=/=/); $note =~ /\((AA \w+->\w+)\)\s*$/; $change = $1 || ""; } elsif ( $line =~ /^FT\s+\/colour=(\d+)/ ) { say qq(Code = $1); if ( $cod{$1} eq "non" ) { printf $out "%-12.12s %-15.15s %s\n", $SNP, $count, $chan +ge; } } }

However when I run the above code I receive a "Use of uninitialised value ($count or $change) in printf at Script.txt line 33 error. This occurs at any part of the text file that contains a non-synonymous mutation.

This code works on another text file I have, and the only difference I can see is that in this example file, strain numbers have a format such as 7521_5#39=A, whereas in the file this code worked for they are written as 7521_5_39=A, i.e. the '#' is replaced with a second '_'.

The ideal output from this code would look like this-

Coordinates No of Strains AA Change 703 1 AA Ala->Thr 937 1 AA Val->Leu 1804 3 AA Ser->Thr

Any help would be much appreciated, but please be advised I am very new to perl and programming in general. This code is also not my own work - a code that I had written suffered the same error.

Many thanks in advance!

MB

Comment on Parsing problem
Select or Download Code
Re: Parsing problem
by Anonymous Monk on Nov 07, 2012 at 22:17 UTC
    It is a warning, don't worry about it :)
      Ya, and while we're at it, might as well get rid of that pesky use strict;, too. It's always breaking my code.

      Edit: Don't do this. I was using a satirical approach to point out that anonymonk's suggestion to ignore warnings is a bad idea.

        the sample input provided and the program provided produce no warnings, and produce the wanted output -- so yeah, get rid of both strict and warnings and nothing will change
Re: Parsing problem
by kcott (Abbot) on Nov 07, 2012 at 23:18 UTC

    G'day MB123,

    Welcome to the monastery.

    I am unable to reproduce the stated problem. Furthermore, after removing the debug print statements (e.g. say qq(DEBUG: Line = "$line");), the output I get is:

    Coordinate No of Strains AA Change 703 1 AA Ala->Thr 937 1 AA Val->Leu 1804 3 AA Ser->Thr

    This appears to be exactly what you were hoping for!

    Please check that the code and data you've posted is the same as that which generates the warnings you described.

    -- Ken

      Hi Ken,

      Thank you for your reply. I have just double checked the whole file and the sample I put up, with and without the debug statements and I am still getting the same result. My output file just prints the coordinates and leaves the 'number of strains' and 'AA Change' columns blank.

        Here's exactly what I used to test your code and data:

        #!/usr/bin/env perl use strict; use warnings; use feature qw(say); #my $file = "BSAC.pl"; my %cod = ( 1 => "red", 2 => "non", 3 => "green" ); #open my $in, "<", "$file"; #open my $out, ">", "output.txt"; say "Coordinate No of Strains AA Change"; my $SNP; my $count; my $change; while ( my $line = <DATA> ) { chomp $line; #say qq(DEBUG: Line = "$line"); if ( $line =~ /^FT\s+SNP\s+(\d+)/ ){ $SNP = $1; #say qq(\$SNP = $1;); } elsif ( $line =~ /^FT\s+\/note="(.*)"/) { my $note = $1; #say qq(my \$note = $1); $count = ($note =~ tr/=/=/); $note =~ /\((AA \w+->\w+)\)\s*$/; $change = $1 || ""; } elsif ( $line =~ /^FT\s+\/colour=(\d+)/ ) { #say qq(Code = $1); if ( $cod{$1} eq "non" ) { printf "%-12.12s %-15.15s %s\n", $SNP, $count, $change; } } } __DATA__ ID SNP FT SNP 433 FT /note="refAllele: T SNPstrains: 7083_1#5=C 7414_8#8=C 7480_8#4 +9=C " FT /colour=1 FT SNP 442 FT /note="refAllele: T SNPstrains: 7065_8#2=C 7065_8#94=C 7083_1# +2=C 7083_1#3=C 7083_1#41=C 7083_1#42=C 7083_1#43=C " FT /colour=1 FT SNP 460 FT /note="refAllele: T SNPstrains: 7564_8#14=C " FT /colour=1 FT SNP 703 FT /note="refAllele: G SNPstrains: 7521_5#39=A (non-synonymous) ( +AA Ala->Thr) " FT /colour=2 FT SNP 937 FT /note="refAllele: G SNPstrains: 7414_8#30=T (non-synonymous) ( +AA Val->Leu) " FT /colour=2 FT SNP 1269 FT /note="refAllele: G SNPstrains: 7480_7#22=A (synonymous) 7480_ +7#62=A (synonymous) " FT /colour=3 FT SNP 1804 FT /note="refAllele: T SNPstrains: 7414_7#66=A (non-synonymous) ( +AA Ser->Thr) 7414_8#44=A (non-synonymous) (AA Ser->Thr) 7521_6#54=A ( +non-synonymous) (AA Ser->Thr) " FT /colour=2

        Try running this. Assuming it works, try changing the data to something you know will generate the warnings - keeping the data to an absolute minimum.

        If you are also unable to reproduce the warnings, then the problem may lie in your input file. There might be embedded characters that aren't showing up when the text is copied and pasted. Anyway, I'm jumping the gun a bit here - see how you go with above code first.

        -- Ken

Re: Parsing problem
by frozenwithjoy (Curate) on Nov 07, 2012 at 23:45 UTC
    It means that you are trying to print $count and $changed, but the contents of those scalars are undefined. This is because they get values in the 1st elsif, but are printed out in the second elsif. I think you need to figure out where you want to actually have the print statement.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1002755]
Approved by tobyink
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (16)
As of 2014-09-02 14:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (24 votes), past polls