Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
# Parse header my ($sig, $ver, $cnt, $reserved) = unpack('l4', $raw); # TODO: validate (signature, reverse byte order, version) return {SIG => $sig, VER => $ver, CNT => $cnt, RSV => $reserved};
Man I can't believe you lightly step over what seems to be the most fun part of the whole spec: the fact that these files can be made in the Endianness you like:
All fields are 32 bits unless noted. If the signature value is not as given, the reader program should byte-swap the signature and check if the swapped version matches. If so, all multiple-byte entities in the file will have to be byte-swapped. This enables these binary files to be used unchanged on different architectures.
I can hardly believe you use "l" to unpack 32-bit integers. I agree with BrowserUK here: you should be using "N" or "V", actually, try both, and return the one that works.
for my $template ("N", "V") { # Parse header my ($sig, $ver, $cnt, $reserved) = unpack($template.'4', $raw); if($sig==0x1A412743) { return {unpack => $template, VER => $ver, CNT => $cnt}; } } # no match: not a .2bit header return undef;
In the rest of your code, always use that $template (or $header->{unpack}) instead of that 'l'.

Now, that wasn't so hard, was it?

For the rest... See blokhead's node — and my reply with my remarks. I agree with his approach as it's probably as fast as you can get in pure Perl. I'd stick to decoding whole bytes in one go. Be careful about memory problems, though: these strings can be very long — and your own approach is even worse, as it uses strings double that size, for decoding, and you're copying it around some more, taking up even more temporary space. Ouch.

And using more memory usually means (much) slower — if you're not simply running out of memory, or you have to close some other programs to be able to run yours. None of them good things.


In reply to Re: Parsing .2bit DNA files by bart
in thread Parsing .2bit DNA files by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (9)
    As of 2014-12-22 01:42 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (110 votes), past polls