Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: How to get non-redundant DNA sequences from a FASTA file?

by Athanasius (Archbishop)
on Sep 13, 2014 at 09:08 UTC ( [id://1100477]=note: print w/replies, xml ) Need Help??


in reply to How to get non-redundant DNA sequences from a FASTA file?

Hello supriyoch_2008,

Same approach as choroba, except that I read the data into a hash mapping headers to sequences, and then reverse that hash to get the non-redundant sequences:

#! perl use strict; use warnings; my $fasta = '>gi1 cds ATG fun >gi2 cds ATG fun >gi3 cds GGG fun'; my %hdrs; $hdrs{$1} = $2 while $fasta =~ / > (.+) \s+ cds \s+ (.*) \s+ fun /g +x; print " A. Header & sequences are:\n"; printf ">%s cds\n%s\n", $_, $hdrs{$_} for sort keys %hdrs; my %seqs; while (my ($key, $value) = each %hdrs) { push @{$seqs{$value}}, $key; } print " B. Only sequences are:\n"; printf "$_\n" for sort keys %seqs; print " C. Non-redundant sequences are:\n"; printf ">%s cds\n%s\n", ( sort @{$seqs{$_}} )[0], $_ for sort keys %se +qs;

Output:

18:47 >perl 1009_SoPW.pl A. Header & sequences are: >gi1 cds ATG >gi2 cds ATG >gi3 cds GGG B. Only sequences are: ATG GGG C. Non-redundant sequences are: >gi1 cds ATG >gi3 cds GGG 18:47 >

Note: on hash reversal, see How-do-I-look-up-a-hash-element-by-value of perlfaq4.

If you really don’t care which header is output when there are redundant sequences, you can just say:

my %seqs = reverse %hdrs; ... print " C. Non-redundant sequences are:\n"; printf ">%s cds\n%s\n", $seqs{$_}, $_ for sort keys %seqs;

When your code starts to get too complicated, it’s usually a good idea to step back and look for a simpler approach. Remember, less is more. ;-)

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: How to get non-redundant DNA sequences from a FASTA file?
by supriyoch_2008 (Monk) on Sep 13, 2014 at 12:54 UTC

    Hi Athanasius,

    Thank you for your suggestions and fixing the problem. Your script has worked nicely. I am grateful to you.

    With regards,

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1100477]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (3)
As of 2024-04-25 17:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found