Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Pipes vs. temporary files

by Anonymous Monk
on Jul 25, 2007 at 17:40 UTC ( [id://628739]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Wise Monks,

I need to read in a large amount of data produced by another program. The data are output to STDOUT.

In my code - I can read the data in as a pipe from the process, or, I can execute the process, redirect the output to a temporary file, and then open the file in perl and proceed from there.

My questions are:

How fast is a pipe compared to a temporary file? (assume the file is on a local filesystem)

Does using a pipe use any large amount of temporary storage "behind the scenes?" (Memory or disk)

Does using a pipe affect the buffering that Perl does? (Again, this goes to the speed question.) Are pipes purely synchronous?

Any help or ancedotes would be greatly appreciated.

Replies are listed 'Best First'.
Re: Pipes vs. temporary files
by ikegami (Patriarch) on Jul 25, 2007 at 17:56 UTC

    Does using a pipe use any large amount of temporary storage "behind the scenes?"

    Pipes have a buffer, but the memory used by it shouldn't be a concern.

    Are pipes purely synchronous?

    When the pipe's buffer is full, the writer will block until the reader makes room in the buffer.

    Similarly, if the reader is faster than the writter, it will block when trying to read from an empty pipe.

    So access is normally asynchronous, but it can degenerate into being synchronous.

    How fast is a pipe compared to a temporary file?

    Should be similar. I would guess that using a pipe would be be faster since it's simpler and doesn't use the disk.

    Does using a pipe affect the buffering that Perl does?

    No. Perl does the same IO buffering for pipes as it would do for files.

    Your other program should also do the same IO buffering for pipes as it would do for files, no matter what language is used.

Re: Pipes vs. temporary files
by bluto (Curate) on Jul 25, 2007 at 20:33 UTC
    One issue with a pipe is that if the producer process dies, the consumer may wait forever on the pipe after processing part of the data. The benefit of a temporary file in this case is that you an ensure you have all of the data before processing it (i.e. by having the producer rename the file into the final name that the consumer expects). This may or may not matter to you.

    Another problem with pipes is that if you use them on a single processor machine you may see a lot of context switching as the OS switches between the producer and consumer (i.e. each time the pipe is filled -- usually after about 4KB).

Re: Pipes vs. temporary files
by Codon (Friar) on Jul 25, 2007 at 18:33 UTC
    This is somewhat off-the-cuff, but I would expect the pipe methodology to be faster in terms of wall clock time. Running the first script and redirecting to a temp file requires that process to run to completion, which I would imagine could take some time since you say it produces a large amount of data. Using a pipe allows you to process the data in your second script as it is produced by your first script.

    Ivan Heffner
    Sr. Software Engineer
    WhitePages.com, Inc.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://628739]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-03-28 15:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found