Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Parsing .2bit DNA files

by bart (Canon)
on Mar 06, 2008 at 21:39 UTC ( #672605=note: print w/ replies, xml ) Need Help??


in reply to Parsing .2bit DNA files

# Parse header my ($sig, $ver, $cnt, $reserved) = unpack('l4', $raw); # TODO: validate (signature, reverse byte order, version) return {SIG => $sig, VER => $ver, CNT => $cnt, RSV => $reserved};
Man I can't believe you lightly step over what seems to be the most fun part of the whole spec: the fact that these files can be made in the Endianness you like:
All fields are 32 bits unless noted. If the signature value is not as given, the reader program should byte-swap the signature and check if the swapped version matches. If so, all multiple-byte entities in the file will have to be byte-swapped. This enables these binary files to be used unchanged on different architectures.
I can hardly believe you use "l" to unpack 32-bit integers. I agree with BrowserUK here: you should be using "N" or "V", actually, try both, and return the one that works.
for my $template ("N", "V") { # Parse header my ($sig, $ver, $cnt, $reserved) = unpack($template.'4', $raw); if($sig==0x1A412743) { return {unpack => $template, VER => $ver, CNT => $cnt}; } } # no match: not a .2bit header return undef;
In the rest of your code, always use that $template (or $header->{unpack}) instead of that 'l'.

Now, that wasn't so hard, was it?

For the rest... See blokhead's node — and my reply with my remarks. I agree with his approach as it's probably as fast as you can get in pure Perl. I'd stick to decoding whole bytes in one go. Be careful about memory problems, though: these strings can be very long — and your own approach is even worse, as it uses strings double that size, for decoding, and you're copying it around some more, taking up even more temporary space. Ouch.

And using more memory usually means (much) slower — if you're not simply running out of memory, or you have to close some other programs to be able to run yours. None of them good things.


Comment on Re: Parsing .2bit DNA files
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://672605]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (11)
As of 2014-07-30 10:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (230 votes), past polls