|laziness, impatience, and hubris|
Re^4: Searching pattern in 400 files and getting count out of each fileby Athanasius (Canon)
|on Nov 09, 2012 at 12:52 UTC||Need Help??|
Within the outer loop “foreach $file (@files)”, each file is opened once for reading, and then closed after the inner loop has completed. (The extra block enclosing this inner loop is redundant, BTW.) But within the inner loop, the filehandle $fh is read-from each time through the loop. The result is that after the first call to <$fh> in list context, the entire file has been read and the filehandle now points to the end of the file. On each subsequent iteration of the inner loop, <$fh> returns an empty list, so $count will then always be zero.
There are two ways to fix this:
(1) Add the following line before the call to grep:
This will ensure that the filehandle begins again at the beginning of the file on each iteration. See seek.
(2) Read the entire file into memory before the inner loop (store it as an array of lines), and apply the grep to this in-memory array. This strategy may take up a lot of memory (i.e., if the files are large), but it will save a lot of processing time. Reading from a file is an inherently time-consuming operation, which your script is currently repeating each time through the inner loop (or, at least, it would be doing so if the seek were in there!).
Now some general advice: As a matter of good Perl style, you should declare a variable only at the latest possible place in the code. In the script as given, a number of variables are declared but not used at all, and others are declared way ahead of time. Perl is not C! Get in the habit of declaring variables at the point of first use, and your code will become clearer and easier to debug and maintain.
Update: Here is my (untested!) re-write of the script:
Hope that helps,
Athanasius <°(((>< contra mundum