PROTEIN FILE help me pleaseee

http://www.perlmonks.org?node_id=1016025

serafinososi has asked for the wisdom of the Perl Monks concerning the following question:

This node falls below the community's threshold of quality. You may see it by logging in.

Comment on PROTEIN FILE help me pleaseee Download Code

Replies are listed 'Best First'.
Re: PROTEIN FILE help me pleaseee by choroba (Cardinal) on Jan 30, 2013 at 12:19 UTC
What do you mean by "doesn't work"? Is the number of F's too big? Does the program output any error messages? When glaring at your script, I can see that you do not chomp the $proteinfilename after reading it from STDIN. The real file name probably does not contain a newline at the end. if the open is not successful, your program continues. Are you really interested in the results, if the file cannot be found? Use the idiom `open my $FH, '<', $filename or die "Cannot open: $!";` [download] next takes a label as a parameter. There are no labels in your code. you do not indent the code. Once you are advanced enough to use loops, please also learn to indent the code. Without indentation, your code becomes write-only, unreadable mess for humans. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re: PROTEIN FILE help me pleaseee (homework) by Anonymous Monk on Jan 30, 2013 at 11:08 UTC
Surely you realize you're not the first to ask this homework question, see Cannot open file in Strawberry Perl, The Perl Monks Guide to the Monastery	[reply]
Re: PROTEIN FILE help me pleaseee by Kenosis (Priest) on Jan 30, 2013 at 16:59 UTC
You've been given excellent suggestions (!): choroba shows the three-argument open, including using lexically-scoped variables (don't use global variables for file opens). 2teez shows a crucial first step: always `use strict; use warnings;` pvaldes imparted a hint on counting the number of "F"s in the sequence Not to confuse matters here, but for your future reference (since you appear to be on a bioinformatics path), consider becoming well acquainted with Bio::SeqIO and its set of related modules. Just as there are well-developed modules to parse HTML, XML, and CSV files, Bio::SeqIO lives to do the same for Fasta and other such formats. For example, to retrieve and process each sequence within a Fasta file, you can do the following: `use strict; use warnings; use Bio::SeqIO; print "Please type the file name of the protein sequence data: "; chomp( my $proteinfilename = <STDIN> ); print "\n"; my $fastaIN = Bio::SeqIO->new( -file => $proteinfilename, -format => ' +Fasta' ); while ( my $seq = $fastaIN->next_seq() ) { print $seq->seq, "\n"; }` [download] Each sequence in the Fasta file is accessible using the `$seq->seq` notation. The first part before the arrow operator is the sequence object; the part after the arrow operator is the method. These methods are covered in detail in the Bio::Seq module's documentation. In the example above, the sequence is printed, but a character count could be done there, too. Hope this helps!	[reply] [d/l] [select]
Re: PROTEIN FILE help me pleaseee by pvaldes (Chaplain) on Jan 30, 2013 at 15:13 UTC
Deliberately incomplete, after all this is (your) homework, but you have some of your problems solved here... only because Oxytricha is a cute green thing #!/usr/bin/perl -w use strict; my @protein = (); my $phenylcounter = 0; open (my $PROTEFILE, '<', $ARGV[0]) or die $!; # If the file is not existing it gives an error message # yup, I avoid the <STDIN> idea, solved by you. while ($line = <$PROTEFILE>) { # We read the lines of the file next if $line =~ m/^>/; # If the line starts with ">" + is not considered. $line =~ s/\s//g; # all white spaces removed from the line # Using the translate command, the program counts the number of F in +the sequence, assigning it to a variable. # left for you... tr/F//... $phenylcounter++} } # the while loop terminates close $PROTEFILE; # and the program continues closing the file. print "The aminoacid sequence", $ARGV[0], " contains ", $phenylcounter +, " Phenylalanine aminoacids"; print "\n"; # followed by new line [download] Updated: ($phenilcounter != $phenylcounter), fixed now	[reply] [d/l]
Re: PROTEIN FILE help me pleaseee by 2teez (Vicar) on Jan 30, 2013 at 14:58 UTC
Hi serafinososi, ...If the line starts with “>” (it is the first line of a FASTA file) the line is not considered... What if the line that starts with ">" is more than one in the file what happens? If I understand the OP's question, using the data provided, if I may suggest (adding to what others have said) using perl function split may do like so: use warnings; use strict; my $protein; while (<DATA>) { if (/^>/) { next; } else { $protein = join '', split; } } my $number_of_F = grep { /F/ } split //, $protein; print "The aminoacid sequence: ", $protein, " contains ", $number_of_F +, " Phenylalanine aminoacids", $/; __DATA__ >gi\|403369491\|gb\|EJY84591.1\| Transcriptional regulator, Sir2 family pr +otein Oxytricha trifallax MMKQLIKHNKNTPLFNFLRVKFSSTAATIQTQQTVNKPIESKFKEEKLDNYHDIYEKSKRLAEQISQSKS + FICFTGAGLSTSTGIPDYRSTSNTLAQTGAGAYELEISEEDKKSKTRQIRSQVQRAKPSISHMALHAL +ME NGYLKHLISQNTDGLHLKSGIPYQNLTELHGNTTVEYCKSCSKIYFRDFRCRSSEDPYHHLTGRQC +EDLK CGGELADEIVHFGESIPKDKLVEALTAASQSDLCLTMGTSLRVKPANQIPIQTIKNKGQLAIVN +LQYTPF DEIAQIRMHSFTDQVLEIVCQELNIKIPEYQMKRRIHIIRNAETNEIVVYGSYGNHKNIKLS +FMQRMEYI DNKNHVYLALDKEPFHIIPDYFNFQNINTDQEEVEFRIHFYGHNSEPYFQLTLPRQSILE +LQAGEHLICD ITFDYDKLEWK [download] If you tell me, I'll forget. If you show me, I'll remember. if you involve me, I'll understand. --- Author unknown to me	[reply] [d/l]
Re: PROTEIN FILE help me pleaseee by Anonymous Monk on Jan 30, 2013 at 23:06 UTC
Please don't remove the original text of a comment.	[reply]
Re^2: PROTEIN FILE help me pleaseee by Anonymous Monk on Jan 31, 2013 at 04:21 UTC
Please don't remove the original text of a comment. But if he keeps it, then teacher will know he cheated	[reply]
Re^3: PROTEIN FILE help me pleaseee by alessandra (Initiate) on Feb 09, 2013 at 11:16 UTC
hi! I have the same problem and i'm going crazy. HELP ME! My perl program is: #!/usr/bin/perl -w print "Please type the file name of the protein sequence data: "; $proteinfilename = <STDIN>; chomp $proteinfilename; unless ( open (PROTEINFILE, $proteinfilename) ) { print "File \"$proteinfilename\" doesn\'t seem to exist!!\n"; } $protein = <PROTEINFILE> ; $empty = " " , (<PROTEINFILE>); while ( $protein = <PROTEINFILE> ) { if ( $protein =~ /^>/ ) { next $protein; } else { $protein =~ s/\s//g ; $union = join ($empty, $protein); } }; close PROTEINFILE ; $quantif = $union; $count = ($quantif =~ tr/F//); print "The aminoacid sequence:\n$union\ncontains $count Tryptophan aminoacids\n\n"; the problem is the count!!!! thank you Alessandra	[reply]
Re^4: PROTEIN FILE help me pleaseee by Anonymous Monk on Feb 09, 2013 at 21:27 UTC

Back to Seekers of Perl Wisdom