Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

fasta file

by PrincePerl (Novice)
on Sep 26, 2011 at 21:44 UTC ( #927958=perlquestion: print w/ replies, xml ) Need Help??
PrincePerl has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

New to perl, started coding 2 months ago. I have a slight problem.

my $filename; my $accession_id; my $sequence_id = ""; my %accession; my $seq; print "Please enter in the filename: "; chomp($filename = <STDIN>); print "Please enter in the accession_ID: "; chomp($accession_id = <STDIN>); open(FILENAME, $filename) or die "cannot open file: $! "; while($_ = <FILENAME>){ if($_ =~ />/) { $seq = $_; } else { $accession{$seq} .= $_; } } foreach my $g (keys %accession) { if ($accession{$g} =~ m/$accession_id/i){ print "$g\n $accession{$g}\n"; } }

I am parsing a fasta file looking for a specific accession#, if the user enters in an accession #, the script finds it and spits out the sequence attached to it.

>gi|YP_8384858.1

ashthfhthfhthghtht

ehshehfhfhghghhhghg

hfhfhfhghghghghghg

I am trying to store the id accesion # in the $seq variable and store the sequence in the hash. I keep getting the use of uninitialized value.

Any help would be greatly appreciated. I am not looking for folks to solve it for me. its just frustrating that i can't get it to work. Thanks in advance.

Comment on fasta file
Download Code
Re: fasta file
by BrowserUk (Pope) on Sep 26, 2011 at 22:51 UTC

    There are many problems with your script.

    When you read the lines in from the file

    while($_ = <FILENAME>){

    They have newlines attached to their ends. And when you store the sequence ID in teh key of the hash:

    $accession{$seq} .= $_;

    You have not removed it. But when you read the accession iD from the user:

    print "Please enter in the accession_ID: "; chomp($accession_id = <STDIN>);

    You did remove the newline. So when you try to compare those two:

    if ($accession{$g} =~ m/$accession_id/i){

    They will not match.

    BUT ... when you are trying to compare the id entered by the user with the one read from the file, instead of comparing it with the key (ID) of the hash element:

    if( $g =~ m/$accession_id/i){

    You are comparing it with the sequence (value) of that hash entry:

    if ($accession{$g} =~ m/$accession_id/i){

    Which obviously wouldn't work, even if you had removed the newlines from both.

    But then, even if you corrected all those problems, there are many other problems with your program.

    First you read all the sequences into the hash -- despite that you know you are only looking for one and could stop reading as soon as you've seen that one.

    And then you iterate over the hash checking if each of them matches the ID the user entered -- but the whole point of a hash is that you do not need to iterate over it. You can simply look up whether the key you need exists.

    Finally, you should take more care over how you format your code. Think of your code as your workbench. If you keep it neat and tidy, you'll find it a lot easier to find what you are looking for.

    Also, if you are intending using Perl to do work, rather than as a hobby, I strongly recommend that you take a basic programmers course, or at least work your way through one of the Learning Perl type books available. It would pay you dividends over and over in the long term.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Hello BrowerUk, I really appreicate the feedback, I was doing the wrong thing all along by not looking through the code. i finally figured it out. This was my revised code. Thanks again.

      #!/usr/bin/perl -w use strict; my $filename; my $accession_id; my %accession; my $gene_Name; print "Please enter in the filename: "; chomp($filename = <STDIN>); print "Please enter in the accession_ID: "; chomp($accession_id = <STDIN>); open(FILENAME, $filename) or die "cannot open file: $! "; while($_ = <FILENAME>) { if ($_ =~ />gi/) { $gene_Name = $_; } else { } $accession{$gene_Name} .= $_; } foreach $gene_Name (keys %accession) { if ($gene_Name =~ /$accession_id/i) { print "$accession{$gene_Name}\n"; } }
Re: fasta file
by molecules (Monk) on Sep 26, 2011 at 23:03 UTC

    You need to check that $accession{$g} exists before using it. Something like changing the line

     if ($accession{$g} =~ m/$accession_id/i){
    to
     if (exists $accession{$g} && $accession{$g} =~ m/$accession_id/i){

    should do the trick.


    EDIT: BrowserUK posted an answer while I was composing mine. His is more comprehensive.

Re: fasta file
by tospo (Hermit) on Sep 27, 2011 at 08:20 UTC
    I guess you are doing this to learn Perl, in which case BrowserUK has already given you all the advice you need to work on this script. However, if you actually want to use this script for real work then I would strongly recommend you look into BioPerl, a lrage collection of modules that help you read loads of file formats and run and analyse results from lots of external tools. There is excellent documentation for beginners here.
Re: fasta file
by pvaldes (Chaplain) on Sep 27, 2011 at 08:59 UTC

    You could also wrote this

    if(/^>/) {

    instead this:

    if($_ =~ />/) {

    That's a little more precise for your case probably

    personally I prefer this more compact form also, but this is a matter of style only

    if(/^>/) {$seq = $_} else {$accession{$seq} .= $_}
    if you want in any case to write the same like this, I suggest that you consider indent your code
    if($_ =~ />/) { $seq = $_; } else { $accession{$seq} .= $_; }

    AND last but not least, don't try to reinvent the wheel, I strongly suggest you to take a look to the bioperl modules, specially the modules related with managing fasta files

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://927958]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (16)
As of 2014-07-22 13:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (113 votes), past polls