Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

PROTEIN FILE help me pleaseee

by serafinososi
on Jan 30, 2013 at 10:52 UTC ( #1016025=perlquestion: print w/ replies, xml ) Need Help??
serafinososi has asked for the wisdom of the Perl Monks concerning the following question:

I need some help pleaseeeee: the question is... Develop a Perl program that ask the user to type the name of a file in the working directory (containing one protein sequence in FASTA format) and prints the number of Phenylalanine aminoacids in the sequence. -It prints a message, prompting the user to insert the file name. -It reads the file name assigning it to one variable. -It opens for reading the file with that name. If the file is not existing it gives an error message. -A scalar variable that will contain the sequence is initialized to the empty string “”. -The program reads one after the other (with a while loop) the lines of the file assigning the read line to a variable. - If the line starts with “>” (it is the first line of a FASTA file) the line is not considered. Else all the white spaces are removed from the line and the line is postponed to the sequence. -When all the lines have been read the while loops terminates and the program continues closing the file. -Using the translate command, the program counts the number of F in the sequence, assigning it to a variable. -It prints the message: “The aminoacid sequence ” followed by the variable containing the read sequence “ contains “ followed by the variable with the number of F, followed by “Phenylalanine aminoacids” followed by new line.

This is the "protein file": >gi|403369491|gb|EJY84591.1| Transcriptional regulator, Sir2 family protein Oxytricha trifallax MMKQLIKHNKNTPLFNFLRVKFSSTAATIQTQQTVNKPIESKFKEEKLDNYHDIYEKSKRLAEQISQSKS FICFTGAGLSTSTGIPDYRSTSNTLAQTGAGAYELEISEEDKKSKTRQIRSQVQRAKPSISHMALHALME NGYLKHLISQNTDGLHLKSGIPYQNLTELHGNTTVEYCKSCSKIYFRDFRCRSSEDPYHHLTGRQCEDLK CGGELADEIVHFGESIPKDKLVEALTAASQSDLCLTMGTSLRVKPANQIPIQTIKNKGQLAIVNLQYTPF DEIAQIRMHSFTDQVLEIVCQELNIKIPEYQMKRRIHIIRNAETNEIVVYGSYGNHKNIKLSFMQRMEYI DNKNHVYLALDKEPFHIIPDYFNFQNINTDQEEVEFRIHFYGHNSEPYFQLTLPRQSILELQAGEHLICD ITFDYDKLEWK I wrote this but doesn't work:

#!/usr/bin/perl -w print "Please type the file name of the protein sequence data: "; $proteinfilename = <STDIN>; unless ( open (PROTEINFILE, $proteinfilename) ) { print "File \"$proteinfilename\" doesn\'t seem to exist!!\n"; } $protein = <PROTEINFILE> ; $empty = " " , (<PROTEINFILE>); while ( $protein = <PROTEINFILE> ) { if ( $protein =~ /^>/ ) { next $protein; } else { $protein =~ s/$empty//g ; $protein = join ($protein, @protein); print $protein; } }

Update: Thx

Comment on PROTEIN FILE help me pleaseee
Download Code
Re: PROTEIN FILE help me pleaseee (homework)
by Anonymous Monk on Jan 30, 2013 at 11:08 UTC
Re: PROTEIN FILE help me pleaseee
by choroba (Abbot) on Jan 30, 2013 at 12:19 UTC
    What do you mean by "doesn't work"? Is the number of F's too big? Does the program output any error messages?

    When glaring at your script, I can see that

    • you do not chomp the $proteinfilename after reading it from STDIN. The real file name probably does not contain a newline at the end.
    • if the open is not successful, your program continues. Are you really interested in the results, if the file cannot be found? Use the idiom
      open my $FH, '<', $filename or die "Cannot open: $!";
    • next takes a label as a parameter. There are no labels in your code.
    • you do not indent the code. Once you are advanced enough to use loops, please also learn to indent the code. Without indentation, your code becomes write-only, unreadable mess for humans.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: PROTEIN FILE help me pleaseee
by 2teez (Priest) on Jan 30, 2013 at 14:58 UTC

    Hi serafinososi,

    ...If the line starts with “>” (it is the first line of a FASTA file) the line is not considered...

    What if the line that starts with ">" is more than one in the file what happens?

    If I understand the OP's question, using the data provided, if I may suggest (adding to what others have said) using perl function split may do like so:

    use warnings; use strict; my $protein; while (<DATA>) { if (/^>/) { next; } else { $protein = join '', split; } } my $number_of_F = grep { /F/ } split //, $protein; print "The aminoacid sequence: ", $protein, " contains ", $number_of_F +, " Phenylalanine aminoacids", $/; __DATA__ >gi|403369491|gb|EJY84591.1| Transcriptional regulator, Sir2 family pr +otein Oxytricha trifallax MMKQLIKHNKNTPLFNFLRVKFSSTAATIQTQQTVNKPIESKFKEEKLDNYHDIYEKSKRLAEQISQSKS + FICFTGAGLSTSTGIPDYRSTSNTLAQTGAGAYELEISEEDKKSKTRQIRSQVQRAKPSISHMALHAL +ME NGYLKHLISQNTDGLHLKSGIPYQNLTELHGNTTVEYCKSCSKIYFRDFRCRSSEDPYHHLTGRQC +EDLK CGGELADEIVHFGESIPKDKLVEALTAASQSDLCLTMGTSLRVKPANQIPIQTIKNKGQLAIVN +LQYTPF DEIAQIRMHSFTDQVLEIVCQELNIKIPEYQMKRRIHIIRNAETNEIVVYGSYGNHKNIKLS +FMQRMEYI DNKNHVYLALDKEPFHIIPDYFNFQNINTDQEEVEFRIHFYGHNSEPYFQLTLPRQSILE +LQAGEHLICD ITFDYDKLEWK

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: PROTEIN FILE help me pleaseee
by pvaldes (Hermit) on Jan 30, 2013 at 15:13 UTC

    Deliberately incomplete, after all this is (your) homework, but you have some of your problems solved here... only because Oxytricha is a cute green thing

    #!/usr/bin/perl -w use strict; my @protein = (); my $phenylcounter = 0; open (my $PROTEFILE, '<', $ARGV[0]) or die $!; # If the file is not existing it gives an error message # yup, I avoid the <STDIN> idea, solved by you. while ($line = <$PROTEFILE>) { # We read the lines of the file next if $line =~ m/^>/; # If the line starts with ">" + is not considered. $line =~ s/\s//g; # all white spaces removed from the line # Using the translate command, the program counts the number of F in +the sequence, assigning it to a variable. # left for you... tr/F//... $phenylcounter++} } # the while loop terminates close $PROTEFILE; # and the program continues closing the file. print "The aminoacid sequence", $ARGV[0], " contains ", $phenylcounter +, " Phenylalanine aminoacids"; print "\n"; # followed by new line

    Updated: ($phenilcounter != $phenylcounter), fixed now

Re: PROTEIN FILE help me pleaseee
by Kenosis (Priest) on Jan 30, 2013 at 16:59 UTC

    You've been given excellent suggestions (!):

    • choroba shows the three-argument open, including using lexically-scoped variables (don't use global variables for file opens).
    • 2teez shows a crucial first step: always use strict; use warnings;
    • pvaldes imparted a hint on counting the number of "F"s in the sequence

    Not to confuse matters here, but for your future reference (since you appear to be on a bioinformatics path), consider becoming well acquainted with Bio::SeqIO and its set of related modules.

    Just as there are well-developed modules to parse HTML, XML, and CSV files, Bio::SeqIO lives to do the same for Fasta and other such formats.

    For example, to retrieve and process each sequence within a Fasta file, you can do the following:

    use strict; use warnings; use Bio::SeqIO; print "Please type the file name of the protein sequence data: "; chomp( my $proteinfilename = <STDIN> ); print "\n"; my $fastaIN = Bio::SeqIO->new( -file => $proteinfilename, -format => ' +Fasta' ); while ( my $seq = $fastaIN->next_seq() ) { print $seq->seq, "\n"; }

    Each sequence in the Fasta file is accessible using the $seq->seq notation. The first part before the arrow operator is the sequence object; the part after the arrow operator is the method. These methods are covered in detail in the Bio::Seq module's documentation. In the example above, the sequence is printed, but a character count could be done there, too.

    Hope this helps!

Re: PROTEIN FILE help me pleaseee
by Anonymous Monk on Jan 30, 2013 at 23:06 UTC
    Please don't remove the original text of a comment.

      Please don't remove the original text of a comment.

      But if he keeps it, then teacher will know he cheated

        hi! I have the same problem and i'm going crazy. HELP ME! My perl program is: #!/usr/bin/perl -w print "Please type the file name of the protein sequence data: "; $proteinfilename = <STDIN>; chomp $proteinfilename; unless ( open (PROTEINFILE, $proteinfilename) ) { print "File \"$proteinfilename\" doesn\'t seem to exist!!\n"; } $protein = <PROTEINFILE> ; $empty = " " , (<PROTEINFILE>); while ( $protein = <PROTEINFILE> ) { if ( $protein =~ /^>/ ) { next $protein; } else { $protein =~ s/\s//g ; $union = join ($empty, $protein); } }; close PROTEINFILE ; $quantif = $union; $count = ($quantif =~ tr/F//); print "The aminoacid sequence:\n$union\ncontains $count Tryptophan aminoacids\n\n"; the problem is the count!!!! thank you Alessandra

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1016025]
Approved by 2teez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (10)
As of 2014-04-18 09:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (464 votes), past polls