Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Yup, "gz.Z" is strange indeed, although I don't think the extra compress would gain anything :)

Compression is even more interesting on these huge machines we have nowadays than it was before, since someone found it's usually faster to compress memory to be "swapped" and keep it in RAM than to write it to disk. Or for doing anything else disk-based for that matter as CPU speed has grown much faster than disk speed. The BNC the OP is dealing with has 100 million word forms and would fit in memory on most machines but meanwhile Google has raised the bar to a trillion word forms. They don't distribute that as text but even their n-gram lists are 24 GB gzipped. If your HD sustains 100 MB/s that's 4 minutes just to read it into memory, or 8 if it's twice the size uncompressed. But on a single core I can zcat at 154 MB/s so it's just faster to keep the stuff gzipped and unzip on the fly. Unzipping to a tempfile and reading that back is much slower on all but the fastest SSDs.


In reply to Re^4: Handling very big gz.Z files by mbethke
in thread Handling very big gz.Z files by albascura

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others chilling in the Monastery: (12)
    As of 2014-08-27 12:21 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (238 votes), past polls