|Perl: the Markov chain saw|
Blocking execution of called programs to reduce buffer sizeby Peterpion (Acolyte)
|on Feb 23, 2014 at 11:52 UTC||Need Help??|
Peterpion has asked for the
wisdom of the Perl Monks concerning the following question:
In relation to a question I posted a couple of days ago theres a quite small question I have which I thought was better asked as a new one. I execute an external program to generate input data for my program and I've been wondering about scalability and whether I am reading the data into the program in the best way (well in fact I know I am not but I just wonder what possibilities are out there).
I use nfdump to generate a dump of fragments of the network 'flows' which I read into an array with split when the program originally executes. A flow is a term for a network connection from start to end, with source and destination IP, bytes and a few other bits of info associated. Currently I am just using backticks to execute this command and theres no memory problem with this currently but what if I had much more data? My program grinds the data down to mere traces in comparison (ie a few meg).
Whats the best way to read in a very large amount of data from an external program - is there anything which could allow a text output of say 100GB to not choke a system (i'm thinking piping the data in, system etc). I think not and AFAIK the only way to do this would be to modify the nfdump command to pause execution when blocked by my program (or use files but I prefer not to really). I wonder if its possible to block execution of the external program without modifying it.
In case I have not been clear, what I mean is in a perl program I write which calls an external program which generates (say) 100GB of data which is then read line by line into my program. I believe it will choke as it fills the OS buffers with that 100GB before I start reading it in line by line (and processing it). Is there a way to make the external program pause?
In real life I would process smaller chunks of data at a time but there could be a case when its desirable to read in a huge amount of data. It could be read from disk by the program generating input for my program and it could easily be written to disk before slurping it in to my program but is there a way to block execution of an external program? Since the system can pause a process I would imagine there is at least one way but using signals to pause a process which is generating input seems potentially fraught with deadlock complexity etc. Its perhaps a slightly theoretical question but one which I find quite interesting so the musings of the wise ones would be highly appreciated :-)