Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Don't ask to ask, just ask
 
PerlMonks  

Re: concatenating identical sequences

by Limbic~Region (Chancellor)
on Oct 04, 2011 at 15:51 UTC ( #929585=note: print w/ replies, xml ) Need Help??


in reply to concatenating identical sequences

$new_guy,
How big are the files you need to work with. It should be fairly trivial to do this using a hash of arrays if everything fits in memory. Assume for a second you had a function that could fetch the next record (mutli-line or not) as well as the id. It would look something like this:

my %data; while (my $rec = fetch_record($fh)) { my $id = $rec->{id}; push @{$data{$id}}, $rec->{sequence}; } for my $id (keys %data) { print "$id "; print "$_\n" for @{$data{$id}}; }
Alternatively, if you can't afford to fit the entire file in memory, you could still use this technique by storing the file offset and not the actual sequence. This will require more IO with tell and seek but should allow the same simplicity in the code.

One last alternative would be to re-write the file merging all the rows for a record on one line. Next, sort the file so duplicate IDs are adjacent and then it should be straight forward to merge them. Since it appears each row is fixed length, recreating the original structure from a single line should be straight forward.

Cheers - L~R


Comment on Re: concatenating identical sequences
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929585]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2014-04-19 00:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (474 votes), past polls