Re: Optimising processing for large data files.

in reply to Optimising processing for large data files.

Good optimisation. In C you can of course avoid the copy and move overhead of the substr buffer you use and just flip pointers between a pair of buffers to get the sliding window.

Runtime on a 1GHz laptop was 10 minutes on a 3GB test file. So the benefits of doing it in C are real but perhaps hardly worth the effort unless saving 20 minutes runtime for adding X minutes coding time makes sense.

$ cat file.c
#include <stdio.h>

#define FILENAME "c:\\test.txt"
#define CHUNK 500

int main()
{
    FILE *f;
    char buf1[CHUNK],buf2[CHUNK],pair[3],*fbuf,*bbuf,*swap;
    int r, i;

    f=fopen(FILENAME,"r");
    if (!f)
        return 1;

    fbuf=buf1;
    bbuf=buf2;

    r=(int)fread( fbuf, sizeof(char), CHUNK, f );
    if ( !r || r<CHUNK )
        return 1;

    pair[2]=0;
    while ( (r=(int)fread( bbuf, sizeof(char), CHUNK, f )) ) {
        for( i=0;i<r;i++ ) {
            pair[0]=fbuf[i];
            pair[1]=bbuf[i];
            /* printf("%s\n",pair);*/
        }
        /* Move old back buffer pointer to front buffer ptr
         * And vice versa. Net effect is to slide buffer->R
         * As we will refil the back buffer with fresh data.
         * Thus we simply pour data from disk to memory with
         * no wasted copying effort.
         */
        swap=fbuf;
        fbuf=bbuf;
        bbuf=swap;
    }

    fclose(f);
    return 0;
}
[download]

cheers

tachyon

In Section Meditations