<?xml version="1.0" encoding="windows-1252"?>
<node id="1002755" title="Parsing problem" created="2012-11-07 16:48:52" updated="2012-11-07 16:48:52">
<type id="115">
perlquestion</type>
<author id="1002751">
MB123</author>
<data>
<field name="doctext">
&lt;p&gt;Hi all,&lt;/p&gt;

I have this large text file in the format shown below -

&lt;code&gt; ID  SNP
FT  SNP 433
FT      /note="refAllele: T SNPstrains: 7083_1#5=C 7414_8#8=C 7480_8#49=C "
FT      /colour=1
FT  SNP 442
FT      /note="refAllele: T SNPstrains: 7065_8#2=C 7065_8#94=C 7083_1#2=C 7083_1#3=C 7083_1#41=C 7083_1#42=C 7083_1#43=C "
FT      /colour=1
FT  SNP 460
FT      /note="refAllele: T SNPstrains: 7564_8#14=C "
FT      /colour=1
FT  SNP 703
FT      /note="refAllele: G SNPstrains: 7521_5#39=A (non-synonymous) (AA Ala-&gt;Thr) "
FT      /colour=2
FT  SNP 937
FT      /note="refAllele: G SNPstrains: 7414_8#30=T (non-synonymous) (AA Val-&gt;Leu) "
FT      /colour=2
FT  SNP 1269
FT      /note="refAllele: G SNPstrains: 7480_7#22=A (synonymous) 7480_7#62=A (synonymous) "
FT      /colour=3
FT  SNP 1804
FT      /note="refAllele: T SNPstrains: 7414_7#66=A (non-synonymous) (AA Ser-&gt;Thr) 7414_8#44=A (non-synonymous) (AA Ser-&gt;Thr) 7521_6#54=A (non-synonymous) (AA Ser-&gt;Thr) "
FT      /colour=2
etc etc...&lt;/code&gt;

And this code-&lt;readmore&gt;

&lt;code&gt;use strict;
use warnings;
use feature qw(say);

my $file = "BSAC.pl";
my %cod = ( 1 =&gt; "red", 2 =&gt; "non", 3 =&gt; "green" );
open my $in, "&lt;", "$file";
open my $out, "&gt;", "output.txt";
say $out "Coordinate   No of Strains   AA Change";

my $SNP;
my $count;
my $change;
while ( my $line = &lt;$in&gt; ) {
    chomp $line;
    say qq(DEBUG: Line = "$line");
    if ( $line =~ /^FT\s+SNP\s+(\d+)/ ){
        $SNP = $1;        
        say qq(\$SNP = $1;);
    } 
    elsif ( $line =~ /^FT\s+\/note="(.*)"/) {
        my $note = $1;
        say qq(my \$note = $1);
        $count = ($note =~ tr/=/=/);
        $note =~ /\((AA \w+-&gt;\w+)\)\s*$/;
        $change = $1 || "";
    }
    elsif ( $line =~ /^FT\s+\/colour=(\d+)/ ) {
        say qq(Code = $1);
        if ( $cod{$1} eq "non" ) {
            printf $out "%-12.12s %-15.15s %s\n",  $SNP, $count, $change;
        }
    }
}&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;However when I run the above code I receive a "Use of uninitialised value ($count or $change) in printf at Script.txt line 33 error. This occurs at any part of the text file that contains a non-synonymous mutation.&lt;/p&gt;

&lt;p&gt;This code works on another text file I have, and the only  difference I can see is that in this example file, strain numbers have a format such as 7521_5#39=A, whereas in the file this code worked for they are written as 7521_5_39=A, i.e. the '#' is replaced with a second '_'.&lt;/p&gt;

&lt;p&gt;The ideal output from this code would look like this-&lt;/p&gt;

&lt;code&gt;Coordinates No of Strains AA Change
703         1             AA Ala-&gt;Thr
937         1             AA Val-&gt;Leu
1804        3             AA Ser-&gt;Thr&lt;/code&gt;

&lt;p&gt;Any help would be much appreciated, but please be advised I am very new to perl and programming in general. This code is also not my own work - a code that I had written suffered the same error.&lt;/p&gt;

&lt;p&gt;Many thanks in advance!&lt;/P&gt;

MB

&lt;/readmore&gt;</field>
</data>
</node>
