Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Is there a definitive module for efficiently searching a collection of text files?

by nysus (Parson)
on Mar 18, 2025 at 10:24 UTC ( [id://11164313]=perlquestion: print w/replies, xml ) Need Help??

nysus has asked for the wisdom of the Perl Monks concerning the following question:

I’m writing a utility to go through my personal perl lib to ensure consistency across files. For example, I will use it to check each file has a comment containing the file’s path. So the utility will be going through and repeatedly searching a lot of text files.

There will be hundreds of documents searched repeatedly for certain strings of text. Rather than reinvent the wheel, I hunted around for a module that would help ensure the search was done efficiently both in terms of time and resources used so I don’t have to burden myself with worrying about the many small details. For example, the easiest thing for me to do would be to load all files into memory and simply search, write them back out to disk, load them all back into memory, search, save, repeat. Though this is easy, it’s obviously not efficient and I was hoping to find a module that would smartly determine how to handle this problem.

But I have searched around a bit and have come up empty. Does anyone have any recommendations?

$PM = "Perl Monk's";
$MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
$nysus = $PM . ' ' . $MC;
Click here if you love Perl Monks

  • Comment on Is there a definitive module for efficiently searching a collection of text files?

Replies are listed 'Best First'.
Re: Is there a definitive module for efficiently searching a collection of text files?
by Discipulus (Canon) on Mar 18, 2025 at 10:36 UTC
    Hello nysus,

    > ..efficiently ..

    did you tried MCE::Grep ? See also other examples on github

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Is there a definitive module for efficiently searching a collection of text files?
by sn1987a (Curate) on Mar 18, 2025 at 12:20 UTC
    You may want to checkout the ack utility and the module behind it

      Ah, this looks very promising. Thanks!

      $PM = "Perl Monk's";
      $MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
      $nysus = $PM . ' ' . $MC;
      Click here if you love Perl Monks

Re: Is there a definitive module for efficiently searching a collection of text files?
by Anonymous Monk on Mar 18, 2025 at 10:32 UTC
    Do you have any requirement that grep doesn't satisfy?

      Two things: getting something off the shelf that just works and avoiding agonizing over the minutiae of grep commands. And taking care of details like file locking, fast searching and making multiple changes to files in memory to make things more efficient.

      I've started to patch a solution together myself with Path::Iterator::Rule for selecting files of interest and then creating a custom file object that will handle in memory search and replaces and file locking. It's probably overkill for what I need but I'd rather make sure I try to do it right. I won't be able to beat grep when it comes to raw searching speed, but I'm not too concerned about that. I'm only dealing with a few hundred files at most.

      $PM = "Perl Monk's";
      $MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
      $nysus = $PM . ' ' . $MC;
      Click here if you love Perl Monks

        You said: "I'm only dealing with a few hundred files at most."

        I would forget about optimizing. Do what is the easiest thing for you to implement. I think you are looking at well less than a minute, maybe even just some ten's of seconds. Also, since these are only relatively small text files (software source), the file will probably be in disk cache if you need to read it again.

Re: Is there a definitive module for efficiently searching a collection of text files?
by Fletch (Bishop) on Mar 18, 2025 at 14:59 UTC

    Not Perl but ripgrep is really fast.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11164313]
Approved by Discipulus
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (4)
As of 2025-06-14 10:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.