Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Listing occurence of all residues

by sreya (Initiate)
on Mar 01, 2015 at 13:08 UTC ( [id://1118268]=perlquestion: print w/replies, xml ) Need Help??

sreya has asked for the wisdom of the Perl Monks concerning the following question:

Dear Perl experts, I request a help from you. I have a list of fasta format sequences, like

>header1

aaaaabbbb

ccccddd

>header2

ggggg

jjj

kkkk

etc... I want to count the frequency of all residues corresponding to each protein id given in the header.ie I need to count occurrence of X(X ="not sure" residues) and unusual amino acids too.If the occurrence of any residue is zero,I just give a blank. Like

ID A B C D G U X J K

ID_1 5 4 4 3 - - - - -

ID_2 - - - - 5 - - 3 4

...........

I have tried.But my code doesn't print it in the required format nor count accordingly rather it continuously count each residue till end of the file.I am biology student and not very good in programming.I am in the learning phase.

I am giving my code here. Please tell me where I am wrong in my code what I need to get the exact output. Thank you all for considering and reading my doubt.

open(FILE1,"e.txt")or die "can't open file for reading\n"; while (<FILE1>) { chomp; next if(/^\s*$/); my $FastaLine= $_; if ( $FastaLine =~ /^>sp\|(\w+\S+)\|/ ) { $header = $FastaLine; } else #storing the sequences and appending the sequence lin +es that come after each header #and storing the sequence as values of $header { $Fasta_split{$header} .= $FastaLine; } if ( $header =~ /^>sp\|(\w+\S+)\|/ ) { my $name = $1; #print "$name\t"; } } while (($header,$Fasta_split{$header})=each(%Fasta_split +) ){ if ( $header =~ /^>sp\|(\w+\S+)\|/ ) { my $name = $1; #print "$name.txt\n"; #print"$name\n$Fasta_split{$header}\n"; my @words= split"", $Fasta_split{$header}; foreach my $w(@words){ $count{$w}++; } while (my($w,$c)=each(%count)){ print "$w:$c\t"; } print "\n"; } }

Replies are listed 'Best First'.
Re: Listing occurence of all residues
by Anonymous Monk on Mar 01, 2015 at 13:23 UTC

    Welcome! Please see How do I post a question effectively? and kindly provide some short but representative sample input data along with the expected output for that sample input (both formatted inside <code> tags), with which one can reproduce the problem.

      Thank you. I have updated the question.I think it is more clear now.

        Hi sreya, could you attach a sample 'e.txt'?
Re: Listing occurence of all residues
by 2teez (Vicar) on Mar 01, 2015 at 15:07 UTC

    Hi sreya,

    I would have love to see the result your code was giving. However, If I may advice I will say use warnings and strict in your perl code ALWAYS.
    There are also several modern way of doing what you want done. Like your open, using a 3-arguments is preferred.
    All that been said, you could get around your code issue like so:

    use warnings; use strict; use Data::Dumper; my %data; my $header; while (<DATA>) { chomp; next if /^$/; # skip on blanck line if (/^>\D+?(\d+?)$/) { $header = $1; } else { $data{$header}{$_}++ for split //, $_; } } print Dumper \%data; __DATA__ >header1 aaaaabbbb ccccddd >header2 ggggg jjj kkkk
    OUTPUT:
    $VAR1 = { '1' => { 'a' => 5, 'b' => 4, 'c' => 4, 'd' => 3 }, '2' => { 'g' => 5, 'j' => 3, 'k' => 4 } };
    How to display the output as desired, that is for the OP! :)

    If you tell me, I'll forget.
    If you show me, I'll remember.
    if you involve me, I'll understand.
    --- Author unknown to me
Re: Listing occurence of all residues
by Anonymous Monk on Mar 01, 2015 at 15:48 UTC

    Thanks for the update; it would be good if you could put your sample input and output inside <code> tags as mentioned previously. Note that the regular expression /^>sp\|(\w+\S+)\|/ does not match the sample data you provided, but I'm guessing that's just because the sample data is oversimplified. Some general things you should do:

    1. Use strict and warnings - important!
    2. Use perltidy
    3. Have a look at the Basic debugging checklist

    The code still has a couple of smaller issues, but the two things that are preventing it from working correctly are:

    1. When you write while ( ($header, $Fasta_split{$header}) = each(%Fasta_split) ) {, you are assigning to / re-using variables that you shouldn't re-use. You could improve this loop a bit using for, sort and keys: for my $header (sort keys %Fasta_split) { (but note you actually shouldn't re-use the variable name $header here!)
    2. You are re-using %count without clearing it first. The easiest way to fix it is to use a %count local to the loop, i.e. put my %count; inside the second loop.

    The output can be made to be a bit more organized also using the same method as above: for my $w (sort keys %count) { print "$w:$count{$w}\t"; } - But of course it's possible to get the output to look even more like what you want it to. The following code loops over the letters you want to output, and prints either the number, or a dash if that number is zero (or undefined). The list of letters could be simplified with Perl's qw//.

    for my $w ("a","b","c","d","g","u","x","j","k") { print $count{$w}||"-", " "; }

    Which gives:

    5 4 4 3 - - - - - - - - - 5 - - 3 4

    Customizing that is left as an exercise to the reader :-) (one tip: see uc and lc)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1118268]
Approved by 2teez
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-19 23:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found