Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

efficient way of searching though large number of text file in a given directory

by Angharad (Pilgrim)
on Dec 01, 2010 at 16:10 UTC ( #874685=perlquestion: print w/ replies, xml ) Need Help??
Angharad has asked for the wisdom of the Perl Monks concerning the following question:

I am about to write a script that takes a particular 'identifier' - just a piece of text really taken from the command line and then I want to go though a number of files within a particular directory and open them up one at a time.
Each of these files would then searched for the one that contains this 'identifier' and then print out the name of that file to the screen.
I'm aware, however, that opening a large number of files once at a time and then searching though them a line at a time just to hunt out a piece of text might be a slow and memory hungry method.
Can anyone think of a sensible approach to writing this script? Any suggestions much appreciated

Comment on efficient way of searching though large number of text file in a given directory
Re: efficient way of searching though large number of text file in a given directory
by moritz (Cardinal) on Dec 01, 2010 at 16:13 UTC
      Agreed. Note that egrep is faster than grep.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re: efficient way of searching though large number of text file in a given directory
by Anonymous Monk on Dec 01, 2010 at 16:32 UTC

    You can also use grep, using e.g. the qx// feature, and read that output as the way of identifying those files that you might need to explore further.

    You can do a lot of useful things in a Unix-shell environment by “piping” together these very useful commands:

    • grep
    • xargs
    • awk
    • perl (of course)

    I put perl in that list, not for humor’s sake, but to point out that Perl doesn’t have to be front-and-center in whatever approach you take.   You can get a lot of very useful work done very fast by “stitching together” existing tools, sometimes eliminating the need to write a single program to do it.

Re: efficient way of searching though large number of text file in a given directory
by Anonymous Monk on Dec 01, 2010 at 16:39 UTC
Re: efficient way of searching though large number of text file in a given directory
by eff_i_g (Curate) on Dec 01, 2010 at 17:28 UTC
    Use fgrep (fast grep) if you only need to search for a string (not a pattern).
Re: efficient way of searching though large number of text file in a given directory
by pemungkah (Priest) on Dec 02, 2010 at 01:02 UTC
    And adding a combination of caching and Linux::Inotify2 would let you get a nice additional speed gain (plus let you know when to invalidate the cache), assuming repeated searches for the same string are going to be happening. You might even be able to quickly re-prime the cache if the underlying files change infrequently by hanging onto the N most-commonly-repeated searches, redoing these, and then reloading them into the cache in the background.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://874685]
Approved by herveus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (14)
As of 2014-11-25 22:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (160 votes), past polls