Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re^7: Parsing problem

by kcott (Canon)
on Nov 08, 2012 at 22:01 UTC ( #1002998=note: print w/ replies, xml ) Need Help??

in reply to Re^6: Parsing problem
in thread Parsing problem

In the code I posted above, everything between the text "Here's exactly what I used to test your code and data:" and the [download] link is part of the one script. I suspect you tried to separate that into two files (script and data). Also, on this site, when code lines don't fit on one line they wrap to the next line - this is indicated by the addition of a plus (+) sign (which is coloured red by default). You don't want these additional plusses in your code; so, use the [download] link to get a plain text version. Save this plain text to a file - the usual filename extension for Perl scripts is ".pl" - the filename I used was (the following assumes you've used the same name). You can now run


and, hopefully, you'll now get the expected output (displayed on the screen - not in a file).

Lines in Perl code that start with a hash (#) sign are comments: they are ignored by Perl when the script is run. [Exception: if the first line begins with #! it's not actually a comment - I think you can safely ignore that for now - see perlrun - #! and quoting on non-Unix systems for more information.] Adding a # to the start of a line of code is referred to as commenting out that line of code. Where I earlier referred to "removing the debug print statements", I actually commented them out, e.g.

#say qq(DEBUG: Line = "$line");

I also commented out all the lines relating to external files:

#my $file = ""; ... #open my $in, "<", "$file"; #open my $out, ">", "output.txt";

and also changed these three lines:

say $out "Coordinate No of Strains AA Change"; ... while ( my $line = <$in> ) { ... printf $out "%-12.12s %-15.15s %s\n", $SNP, $count, $change;


say "Coordinate No of Strains AA Change"; ... while ( my $line = <DATA> ) { ... printf "%-12.12s %-15.15s %s\n", $SNP, $count, $change;

Removing $out from the say and printf statements means output now goes to the screen instead of a file (that's an oversimplification but will suffice for this discussion). Changing $in to DATA means the input is now everything following __DATA__ - see perldata (under Special Literals) for more details about this. Assuming that you did put everything after __DATA__ in a separate file, all of this explains "I get "Coordinates No of Strains AA Change" printed in my command line, and no output file created.".

That should get us back to: "Try running this. Assuming it works, try changing the data to something you know will generate the warnings - keeping the data to an absolute minimum.".

Going all the way back to your original posting, you have an input file called and a script called Script.txt. I don't have a Perl running under MSWin to test this; however, I expect MSWin will interpret your input file as a Perl script and your actual Perl script as a plain text file. It's possible one (or both) of these may be related to your original problem. Try renaming your input file to BSAC.txt and your Perl script to and see if you get better results.

I'd also recommend you read perlintro (a brief introduction and overview of Perl) and bookmark perl (which has links to all the Perl documentation).

-- Ken

Comment on Re^7: Parsing problem
Select or Download Code
Replies are listed 'Best First'.
Re^8: Parsing problem
by MB123 (Initiate) on Nov 09, 2012 at 12:31 UTC

    Thank you for the very informative post.

    I followed your instructions and the output printed perfectly into the command line. I changed the file extensions around (to BSAC.txt and and obtained the same result - uninitialised value errors and the Number of Strains and AA Change columns were left blank.

    I decided to try pasting the rest of my file under the __DATA__ part of your code. For the most part it worked as expected, but with interspersed Use of uninitialised value within %cod in string eq at line 32, <DATA> line XXX.. These lines are referring to non-synonymous mutations that code for a STOP codon, like the one below-

    FT SNP 2811491 FT /note="refAllele: G SNPstrains: 7414_8#89=A (SNP codon is ST +OP) (AA Gln->STOP) " FT /colour=4
    I assume this is because the code has not matched 'non' in the line. Could this also be the cause of my uninitialised value error in the original post?

    Many Thanks

      Your problem here is that the number after 'FT        /colour=' (in this case "4"), is used as a key for %cod but you've only assigned @cod{qw{1 2 3}}. Using the following line solved the problem for me (you'll probably want a different value):

      my %cod = ( 1 => "red", 2 => "non", 3 => "green", 4 => 'value for key +4' );

      Alternatively, you can check $cod{$1} before using it:

      if ( exists $cod{$1} and $cod{$1} eq "non" ) { printf ...

      -- Ken

      I changed all the (SNP codon is STOP) to 'non-synonymous' and have not achieved better results.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1002998]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (12)
As of 2015-12-01 09:59 GMT
Find Nodes?
    Voting Booth?

    What would be the most significant thing to happen if a rope (or wire) tied the Earth and the Moon together?

    Results (798 votes), past polls