Re: Creating hash from data extracted from text file in fasta format

Here’s one way to approach this task:

#! perl
use strict;
use warnings;

my (%seqs, $id, $dna);

while (my $line = <>)
{
    chomp $line;

    if ($line =~ / ^ > (.+) /x)
    {
        $seqs{$id} = $dna if defined $id;
        $id        = $1;
        $dna       = '';
    }
    else
    {
        $dna      .= $line;
    }
}

$seqs{$id} = $dna if defined $id;

for my $key (sort { length $seqs{$a} <=>
                    length $seqs{$b} } keys %seqs)
{
    printf "%s:%d\n", $key, length $seqs{$key};
}
[download]

Output:

15:55 >perl 1406_SoPW.pl data.fas
SequenceID|9876_Gene2:15
SequenceID|1234_Gene1:16

15:55 >
[download]

Notes:

The above code contains no error checking! In particular, it doesn’t check that the fasta file format is valid. You say “I do not want to use BioPerl”, but a dedicated module is usually better and safer than hand-written code.
The special filehandle <> reads from the file(s) specified on the command line (or from standard input if no files are specified). For other approaches, see perlopentut#Opening-Text-Files-for-Reading.
You say you want to sort the data by length, but you don’t specify the sort order. I have assumed increasing order. If you want decreasing order instead, reverse the occurrences of $a and $b: sort { length $seqs{$b} <=> length $seqs{$a} }

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

Comment on Re: Creating hash from data extracted from text file in fasta format Select or Download Code


Think about Loose Coupling
	PerlMonks