Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re^4: Searching pattern in 400 files and getting count out of each file

by Athanasius (Chancellor)
on Nov 09, 2012 at 12:52 UTC ( #1003119=note: print w/replies, xml ) Need Help??

in reply to Re^3: Searching pattern in 400 files and getting count out of each file
in thread Searching pattern in 400 files and getting count out of each file

Within the outer loop “foreach $file (@files)”, each file is opened once for reading, and then closed after the inner loop has completed. (The extra block enclosing this inner loop is redundant, BTW.) But within the inner loop, the filehandle $fh is read-from each time through the loop. The result is that after the first call to <$fh> in list context, the entire file has been read and the filehandle now points to the end of the file. On each subsequent iteration of the inner loop, <$fh> returns an empty list, so $count will then always be zero.

There are two ways to fix this:

(1) Add the following line before the call to grep:

seek($fh, 0, 0);

This will ensure that the filehandle begins again at the beginning of the file on each iteration. See seek.

(2) Read the entire file into memory before the inner loop (store it as an array of lines), and apply the grep to this in-memory array. This strategy may take up a lot of memory (i.e., if the files are large), but it will save a lot of processing time. Reading from a file is an inherently time-consuming operation, which your script is currently repeating each time through the inner loop (or, at least, it would be doing so if the seek were in there!).

Now some general advice: As a matter of good Perl style, you should declare a variable only at the latest possible place in the code. In the script as given, a number of variables are declared but not used at all, and others are declared way ahead of time. Perl is not C! Get in the habit of declaring variables at the point of first use, and your code will become clearer and easier to debug and maintain.

Update: Here is my (untested!) re-write of the script:

#! /usr/perl use strict; use warnings; use DBI; my $dbh = DBI->connect('DBI:Oracle:R12COE', 'apps', 'app5vis') or die "couldn't connect to database: " . DBI->errstr; my $sth = $dbh->prepare("SELECT DISTINCT UPPER(OBJECT_NAME) FROM CG_COMPARATIVE_MATRIX_TAB WHERE OBJECT_NAME IS NOT NULL ORDER BY 1 ASC") or die "couldn't prepare statement: " . $dbh->errstr; $dbh->{AutoCommit} = 0; $dbh->{RaiseError} = 1; $dbh->{ora_check_sql} = 0; $dbh->{RowCacheSize} = 16; $sth->execute; my @obj_name; for (my $j = 0; my @data = $sth->fetchrow_array(); ++$j) { $obj_name[$j] = $data[0]; } my $dir = '/u05/oracle/R12COE/spotlighter/Search_Files/Forms'; opendir(my $dh, $dir) or die $!; my @files = grep { -f "$dir/$_" } readdir $dh; closedir($dh) or die $!; foreach my $file (@files) { my ($ext) = $file =~ /(\.[^.]+)$/; open(my $fh, '<', $file) or die "couldn't open $file: $!"; my @lines = <$fh>; close($fh) or die "couldn't close $file: $!"; for my $obj_name (@obj_name) { my $count = grep /$obj_name/, @lines; my $sth1 = $dbh->prepare("INSERT INTO CUSTOM_FILES_SUMMARY(FILE_NAME, FILE_TYPE,DEP_OBJECT_NAME,OCCURANCE +) VALUES('$file','$ext','$obj_name',$ +count)") or die "couldn't insert statement: " . $dbh->errstr; $sth1->execute; } } $dbh->disconnect;

Hope that helps,

Athanasius <°(((><contra mundum

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1003119]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2018-03-18 19:35 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (230 votes). Check out past polls.