http://www.perlmonks.org?node_id=1002960


in reply to Re^5: Parsing problem
in thread Parsing problem

I suspect I am missing a lot of fundamental knowledge! Here is my output -
This is perl 5, version 16, subversion 1 (v5.16.1) built for MSWin32-x +64-multi-thread Copyright 1987-2012, Larry Wall Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 +source kit. Complete documentation for Perl, including FAQ lists, should be found +on this system using "man perl" or "perldoc perl". If you have access + to the Internet, point your browser at http://www.perl.org/, the Per +l Home Page.

Replies are listed 'Best First'.
Re^7: Parsing problem
by kcott (Archbishop) on Nov 08, 2012 at 22:01 UTC

    In the code I posted above, everything between the text "Here's exactly what I used to test your code and data:" and the [download] link is part of the one script. I suspect you tried to separate that into two files (script and data). Also, on this site, when code lines don't fit on one line they wrap to the next line - this is indicated by the addition of a plus (+) sign (which is coloured red by default). You don't want these additional plusses in your code; so, use the [download] link to get a plain text version. Save this plain text to a file - the usual filename extension for Perl scripts is ".pl" - the filename I used was pm_gene_uninit.pl (the following assumes you've used the same name). You can now run

    perl pm_gene_uninit.pl

    and, hopefully, you'll now get the expected output (displayed on the screen - not in a file).

    Lines in Perl code that start with a hash (#) sign are comments: they are ignored by Perl when the script is run. [Exception: if the first line begins with #! it's not actually a comment - I think you can safely ignore that for now - see perlrun - #! and quoting on non-Unix systems for more information.] Adding a # to the start of a line of code is referred to as commenting out that line of code. Where I earlier referred to "removing the debug print statements", I actually commented them out, e.g.

    #say qq(DEBUG: Line = "$line");

    I also commented out all the lines relating to external files:

    #my $file = "BSAC.pl"; ... #open my $in, "<", "$file"; #open my $out, ">", "output.txt";

    and also changed these three lines:

    say $out "Coordinate No of Strains AA Change"; ... while ( my $line = <$in> ) { ... printf $out "%-12.12s %-15.15s %s\n", $SNP, $count, $change;

    to

    say "Coordinate No of Strains AA Change"; ... while ( my $line = <DATA> ) { ... printf "%-12.12s %-15.15s %s\n", $SNP, $count, $change;

    Removing $out from the say and printf statements means output now goes to the screen instead of a file (that's an oversimplification but will suffice for this discussion). Changing $in to DATA means the input is now everything following __DATA__ - see perldata (under Special Literals) for more details about this. Assuming that you did put everything after __DATA__ in a separate file, all of this explains "I get "Coordinates No of Strains AA Change" printed in my command line, and no output file created.".

    That should get us back to: "Try running this. Assuming it works, try changing the data to something you know will generate the warnings - keeping the data to an absolute minimum.".

    Going all the way back to your original posting, you have an input file called BSAC.pl and a script called Script.txt. I don't have a Perl running under MSWin to test this; however, I expect MSWin will interpret your input file as a Perl script and your actual Perl script as a plain text file. It's possible one (or both) of these may be related to your original problem. Try renaming your input file to BSAC.txt and your Perl script to Script.pl and see if you get better results.

    I'd also recommend you read perlintro (a brief introduction and overview of Perl) and bookmark perl (which has links to all the Perl documentation).

    -- Ken

      Thank you for the very informative post.

      I followed your instructions and the output printed perfectly into the command line. I changed the file extensions around (to BSAC.txt and Script.pl) and obtained the same result - uninitialised value errors and the Number of Strains and AA Change columns were left blank.

      I decided to try pasting the rest of my file under the __DATA__ part of your code. For the most part it worked as expected, but with interspersed Use of uninitialised value within %cod in string eq at Script.pl line 32, <DATA> line XXX.. These lines are referring to non-synonymous mutations that code for a STOP codon, like the one below-

      FT SNP 2811491 FT /note="refAllele: G SNPstrains: 7414_8#89=A (SNP codon is ST +OP) (AA Gln->STOP) " FT /colour=4
      I assume this is because the code has not matched 'non' in the line. Could this also be the cause of my uninitialised value error in the original post?

      Many Thanks

        Your problem here is that the number after 'FT        /colour=' (in this case "4"), is used as a key for %cod but you've only assigned @cod{qw{1 2 3}}. Using the following line solved the problem for me (you'll probably want a different value):

        my %cod = ( 1 => "red", 2 => "non", 3 => "green", 4 => 'value for key +4' );

        Alternatively, you can check $cod{$1} before using it:

        if ( exists $cod{$1} and $cod{$1} eq "non" ) { printf ...

        -- Ken

        I changed all the (SNP codon is STOP) to 'non-synonymous' and have not achieved better results.