Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: concatenating identical sequences

by Limbic~Region (Chancellor)
on Oct 04, 2011 at 15:51 UTC ( #929585=note: print w/replies, xml ) Need Help??

in reply to concatenating identical sequences

How big are the files you need to work with. It should be fairly trivial to do this using a hash of arrays if everything fits in memory. Assume for a second you had a function that could fetch the next record (mutli-line or not) as well as the id. It would look something like this:
my %data; while (my $rec = fetch_record($fh)) { my $id = $rec->{id}; push @{$data{$id}}, $rec->{sequence}; } for my $id (keys %data) { print "$id "; print "$_\n" for @{$data{$id}}; }
Alternatively, if you can't afford to fit the entire file in memory, you could still use this technique by storing the file offset and not the actual sequence. This will require more IO with tell and seek but should allow the same simplicity in the code.

One last alternative would be to re-write the file merging all the rows for a record on one line. Next, sort the file so duplicate IDs are adjacent and then it should be straight forward to merge them. Since it appears each row is fixed length, recreating the original structure from a single line should be straight forward.

Cheers - L~R

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://929585]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2018-05-21 23:27 GMT
Find Nodes?
    Voting Booth?