Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Good optimisation. In C you can of course avoid the copy and move overhead of the substr buffer you use and just flip pointers between a pair of buffers to get the sliding window.

Runtime on a 1GHz laptop was 10 minutes on a 3GB test file. So the benefits of doing it in C are real but perhaps hardly worth the effort unless saving 20 minutes runtime for adding X minutes coding time makes sense.

$ cat file.c #include <stdio.h> #define FILENAME "c:\\test.txt" #define CHUNK 500 int main() { FILE *f; char buf1[CHUNK],buf2[CHUNK],pair[3],*fbuf,*bbuf,*swap; int r, i; f=fopen(FILENAME,"r"); if (!f) return 1; fbuf=buf1; bbuf=buf2; r=(int)fread( fbuf, sizeof(char), CHUNK, f ); if ( !r || r<CHUNK ) return 1; pair[2]=0; while ( (r=(int)fread( bbuf, sizeof(char), CHUNK, f )) ) { for( i=0;i<r;i++ ) { pair[0]=fbuf[i]; pair[1]=bbuf[i]; /* printf("%s\n",pair);*/ } /* Move old back buffer pointer to front buffer ptr * And vice versa. Net effect is to slide buffer->R * As we will refil the back buffer with fresh data. * Thus we simply pour data from disk to memory with * no wasted copying effort. */ swap=fbuf; fbuf=bbuf; bbuf=swap; } fclose(f); return 0; }

cheers

tachyon


In reply to Re: Optimising processing for large data files. by tachyon
in thread Optimising processing for large data files. by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others musing on the Monastery: (6)
    As of 2014-08-23 06:28 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (172 votes), past polls