Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Hi monks,

In relation to a question I posted a couple of days ago theres a quite small question I have which I thought was better asked as a new one. I execute an external program to generate input data for my program and I've been wondering about scalability and whether I am reading the data into the program in the best way (well in fact I know I am not but I just wonder what possibilities are out there).

I use nfdump to generate a dump of fragments of the network 'flows' which I read into an array with split when the program originally executes. A flow is a term for a network connection from start to end, with source and destination IP, bytes and a few other bits of info associated. Currently I am just using backticks to execute this command and theres no memory problem with this currently but what if I had much more data? My program grinds the data down to mere traces in comparison (ie a few meg).

Whats the best way to read in a very large amount of data from an external program - is there anything which could allow a text output of say 100GB to not choke a system (i'm thinking piping the data in, system etc). I think not and AFAIK the only way to do this would be to modify the nfdump command to pause execution when blocked by my program (or use files but I prefer not to really). I wonder if its possible to block execution of the external program without modifying it.

In case I have not been clear, what I mean is in a perl program I write which calls an external program which generates (say) 100GB of data which is then read line by line into my program. I believe it will choke as it fills the OS buffers with that 100GB before I start reading it in line by line (and processing it). Is there a way to make the external program pause?

In real life I would process smaller chunks of data at a time but there could be a case when its desirable to read in a huge amount of data. It could be read from disk by the program generating input for my program and it could easily be written to disk before slurping it in to my program but is there a way to block execution of an external program? Since the system can pause a process I would imagine there is at least one way but using signals to pause a process which is generating input seems potentially fraught with deadlock complexity etc. Its perhaps a slightly theoretical question but one which I find quite interesting so the musings of the wise ones would be highly appreciated :-)

In reply to Blocking execution of called programs to reduce buffer size by Peterpion

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-03-19 02:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found