Re: Multithreading a large file split to multiple files

by BrowserUk (Pope)
on May 14, 2018 at 22:07 UTC

in reply to Multithreading a large file split to multiple files

Can I make it run on multiple cores so it runs faster?

Short answer: no.

The logic of your code dictates the records in the input file are read in strict first to last sequence. Thus, any overhead from switching threads or sharing state is additional time to that required for processing.

Even the code towards the end of the loop, is dependent on state changes earlier in that loop.

And with 15GB of input, there isn't even any mileage in accumulating output in memory to avoid disk thrash.

It's doubtful if even MCE can help you with this.

Re^2: Multithreading a large file split to multiple files
by Marshall (Abbot) on May 15, 2018 at 08:53 UTC
    I agree that multiple cores will not help because there is a blocking point of the sequential read of the input file.

    I am not so sure about output buffering. I really don't know in this situation, but depending upon the file system and other factors like the intelligence of the disk controller, increasing the buffer size for write could make a difference?
    Just an idea to try. I would benchmark 64K vs standard size (which I guess is probably 4K) and see if there is any significant difference.

      Thank you for the suggestion. I'll try that out.

Node Type: note
As of 2018-11-14 07:11 GMT
