Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
# Parse header my ($sig, $ver, $cnt, $reserved) = unpack('l4', $raw); # TODO: validate (signature, reverse byte order, version) return {SIG => $sig, VER => $ver, CNT => $cnt, RSV => $reserved};
Man I can't believe you lightly step over what seems to be the most fun part of the whole spec: the fact that these files can be made in the Endianness you like:
All fields are 32 bits unless noted. If the signature value is not as given, the reader program should byte-swap the signature and check if the swapped version matches. If so, all multiple-byte entities in the file will have to be byte-swapped. This enables these binary files to be used unchanged on different architectures.
I can hardly believe you use "l" to unpack 32-bit integers. I agree with BrowserUK here: you should be using "N" or "V", actually, try both, and return the one that works.
for my $template ("N", "V") { # Parse header my ($sig, $ver, $cnt, $reserved) = unpack($template.'4', $raw); if($sig==0x1A412743) { return {unpack => $template, VER => $ver, CNT => $cnt}; } } # no match: not a .2bit header return undef;
In the rest of your code, always use that $template (or $header->{unpack}) instead of that 'l'.

Now, that wasn't so hard, was it?

For the rest... See blokhead's node — and my reply with my remarks. I agree with his approach as it's probably as fast as you can get in pure Perl. I'd stick to decoding whole bytes in one go. Be careful about memory problems, though: these strings can be very long — and your own approach is even worse, as it uses strings double that size, for decoding, and you're copying it around some more, taking up even more temporary space. Ouch.

And using more memory usually means (much) slower — if you're not simply running out of memory, or you have to close some other programs to be able to run yours. None of them good things.

In reply to Re: Parsing .2bit DNA files by bart
in thread Parsing .2bit DNA files by Limbic~Region

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-06-16 21:28 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.