The lines in question can be up to 2 700 000 000 000 000 characters. | [reply] |
Given quantities of that magnitude, and the relative simplicity of the task (breaking the stream into a sequence of numerics), I'd say it's worthwhile to write an application in C and compile it.
It would be a short and easy program to write, esp. as a stdin-stdout filter: it's just a while loop that reads a nice size char buffer (say, a few MB at a time), and steps through the buffer one character at a time, accumulating consecutive digit characters, and outputting the string of digits every time you encounter a non-digit character. It wouldn't be more than 20 lines of C code, if that, and you'll save a lot of run-time.
I suppose there must be more to your overall process than just splitting into digit strings; you could still do that extra part of your process in perl, but have the perl script read from the output of the C program. (But again, given the quantity of data, if the other stuff can be done in C without too much trouble, I'd do that.)
UPDATE: Okay, I admit I was wrong about how many lines of C it would take. This C program is 26 30 lines (not counting the 4 blank lines added for legibility):
(2nd update: added four more lines at the end to handle the case where the last char in the stream happens to be a digit.)
| [reply] [d/l] |
That's actually a very good idea, haven't thought about this approach.
| [reply] |
| [reply] |