Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: How to get non-redundant DNA sequences from a FASTA file?

by choroba (Cardinal)
on Sep 13, 2014 at 08:49 UTC ( [id://1100471]=note: print w/replies, xml ) Need Help??


in reply to How to get non-redundant DNA sequences from a FASTA file?

When you hear "unique", think "hash". In this case, you need to hash headers by sequences:
#!/usr/bin/perl use warnings; use strict; my $fasta = << '__FASTA__'; >gi1 cds ATG fun >gi2 cds ATG fun >gi3 cds GGG fun __FASTA__ my @seq_with_hdr = split /\n>/, $fasta; $seq_with_hdr[0] =~ s/^>//; my %hdr_by_seq; for (@seq_with_hdr) { my ($hdr, $seq) = split /\n/; $hdr_by_seq{$seq} = $hdr; } for my $seq (keys %hdr_by_seq) { print ">$hdr_by_seq{$seq}\n$seq\n" }

Note that whitespace is not ignored in the data. There was a space after one of "ATG FUN" sequences which makes it different to the same sequence without the trailing space. I removed the space in my code.

لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Replies are listed 'Best First'.
Re^2: How to get non-redundant DNA sequences from a FASTA file?
by supriyoch_2008 (Monk) on Sep 13, 2014 at 12:48 UTC

    Hi Choroba,

    Thank you very much for fixing the problem and providing me valuable suggestions regarding unique (array) and whitespace. I shall follow your suggestions. I searched in google for fixing this problem using perl code but I didn't get any such information. But I found a script based on Java program which I do not know. The URL for java solution is http://seqanswers.com/forums/showthread.php?t=4442

    So, I wrote to perl monks for help.

    With regards

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1100471]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-04-23 15:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found