...is it better to open the output file once, in the beginning and keep it open ..., or open and close it whenever I have something to print into it?
Opening is a system call. Closing results in system calls. Both of these acts consume some amount of time greater than zero. The significance of the amount of time each call takes is something that one could only quantify by knowing how many times the open/close cycle are happening in your script, and how much time your script is spent inside of these calls.
This important information that reveals the significance of the open/close calls within your runtime can be determined by using Devel::NYTProf. But if your script is taking 15 minutes to run with your data set, before using NYTProf, create a sample data set that's about 10-20% of that size while remaining representative of your real data. Then profile, and see where your problems are.
| [reply] |
Generally speaking: when optimizing, measure, don't guess. Try both variants and check which actually runs faster, and by how much.
That said, opening and closing a file each time you need to write to it seems wasteful to me. Why are you doing that? If you need to ensure that your data hits the disk instead of getting buffered, explicitely flushing the file and/or enabling output autoflush may be a better option (but again: measure, don't guess!). perlfaq5 has some information on how to do this.
If you're only looking to make your script run faster in general, then (again!) measure where it's slow, don't guess; use a profiler to find the hotspots, and work on optimizing those.
| [reply] |
We don't really know enough about your code to give a definitive answer. I think it's pretty obvious that opening and closing a file each time you write to it will take more CPU cycles than keeping it open the whole time. However, there are cases where you might have to close and re-open a file, for example if other processes are accessing the same file during the run of the script. Please see How do I post a question effectively?
It also helps to profile your code to check where the bottlenecks really are.
| [reply] |
There are a handful of reasons to close and reopen a filehandle repeatedly:
- you have another process reading the file while you're writing it
- you want to be really extra sure you get partial output safely to disk in case the program doesn't complete
- you have to reopen the filehandle redirected from or to multiple sources or destinations over the course of the program
If none of these apply, you can save some cycles by leaving it open. You may save a small bit of memory while it's not open. Whether either of these is worthwhile to worry about I can't say without measuring. I doubt this is a matter of any concern unless you're closing and reopening in a tight loop.
| [reply] |
Consider, STDOUT/STDERR, they're almost always open for you, but you rarely open/close STDOUT/STDERR frequently from a program -- how does your usage compare?
Consider flock/File::Lockfile, could some other program read/update the file you're printing to? Maybe you want to open a tempfile() which you rename to final filename when you're done with the file?
| [reply] |
yep common consensus would suggest opening /closing file handles will be wasteful
but yes go ahead and time it
| [reply] |