Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

pattern match -vs- *ix grep

by ministry (Scribe)
on Apr 07, 2005 at 13:21 UTC ( #445645=perlquestion: print w/replies, xml ) Need Help??

ministry has asked for the wisdom of the Perl Monks concerning the following question:

monks, I would think that a small snippet of code like the one below would easily be able to match the speed of grep (of course without all the extra little features), just for a simple pattern match. Can anyone tell me of a better way to do this?

(at the very least to prove to the guys at the office how much cooler perl is, than just the regular shell commands)

#!/usr/bin/perl # $start=shift; $pattern=shift; open(FILE,"$start"); while (<FILE>){ if (/$pattern/) { print; } }
Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.

Replies are listed 'Best First'.
Re: pattern match -vs- *ix grep
by Limbic~Region (Chancellor) on Apr 07, 2005 at 13:40 UTC
    You might want to take a look at PPT::Util's tcgrep. In general, a compiled command line tool for a specific task is going to be faster than perl which is designed to be flexible. Depending on your specific needs, it is possible to make perl win. For instance, if you only care if the search string is present in the file, you can abort as soon as it is found. You can also use a sliding window buffer so that less disk I/O is involved.

    I did this very thing shortly after joining the Monastery nearly 3 years ago. My cow orkers were impressed and my code replaced the shell scripts in production.

    Cheers - L~R

Re: pattern match -vs- *ix grep
by Taulmarill (Deacon) on Apr 07, 2005 at 13:31 UTC
    use it directly from command line:
    perl -pe'$_="" unless /pattern/' file
    but i don't think, that it will be faster than grep
Re: pattern match -vs- *ix grep
by tlm (Prior) on Apr 07, 2005 at 14:01 UTC

    #!/usr/bin/env perl use strict; use warnings; die "Usage: blah blah\n" unless @ARGV; my $regex = qr/@{[shift]}/; /$regex/ && print while <>;
    It certainly won't beat /bin/grep in speed, but you can give it cooler regexps. Note that this version takes any number of files as input. And it differs from /bin/grep in one important point: it returns a 0 status even when it finds no matches.

    Also watch out for regexp characters that have special meaning to the shell.

    Update: fixed stray -w in first line.

    the lowliest monk

Re: pattern match -vs- *ix grep
by VSarkiss (Monsignor) on Apr 07, 2005 at 14:07 UTC
      Sillyness. Anyone typing in
      $ '| rm -rf *' 'ouch'
      could as well have typed
      $ rm -rf *
      No point in checking the arguments.
Re: pattern match -vs- *ix grep
by Anonymous Monk on Apr 07, 2005 at 14:25 UTC
    Actually, I would expect grep to be faster than Perl all the time. Grep is a special purpose tool, accepting simpler regexes than Perl is able to handle. You can do more with Perl than you can with grep, but, IMO, a match between grep and Perl isn't going the best way to "win people over". Best you can hope for is that Perl isn't much slower.

    Having said that, I would just write it as:

    perl -ne 'BEGIN {$p = shift} print if /$p/' PATTERN files ...
    But that's still significant longer than:
    grep PATTERN files ...
      Very impressive!
      All these one-liners and code snippets do a very good job. When using 'time' to check for speed they all seem to be relatively close as I perfmorm my searching through large files. However (just as everyone has been saying) you just cant beat a general purpose tool like /bin/grep for a simple search - there has been a distinct difference in search times (in particular, as the file searches get larger). I guess now Im going to resort to *dazzling* everyone here at work with my fancy perl regex search strings, multi-file searches,etc... :)
      cheers, ev

      Good judgement comes with experience. Unfortunately, the experience usually comes from bad judgement.
Re: pattern match -vs- *ix grep
by dave_the_m (Monsignor) on Apr 07, 2005 at 16:10 UTC
    Adding an 'o' to the end of the regexp avoids the pattern being recompiled each time round the loop, ie
    if (/$pattern/o) {


      That only matters if either the pattern changes (which it doesn't), or you have a really, really ancient perl.

      For the past several years, Perl knows the pattern hasn't changed, and will not recompile the regex. (Use -Dr if you're not convinced).

        Ah yes, silly me. It bypasses the calls to the gvsv and regcomp ops though, so there's still a marginal saving.


Re: pattern match -vs- *ix grep
by cazz (Pilgrim) on Apr 07, 2005 at 14:21 UTC
    If you are dead set on using perl regular expressions, you also might want to take a look at pcregrep. Same syntax, supports most of the features you probably want out of a regex, but with a LOT less overhead.
Re: pattern match -vs- *ix grep
by QM (Parson) on Apr 07, 2005 at 16:01 UTC
    The last time I checked (which was a long time ago), egrep was faster than grep, to the point I did:
    alias grep 'egrep \!*'
    (in .cshrc)

    Quantum Mechanics: The dreams stuff is made of

Re: pattern match -vs- *ix grep
by tlm (Prior) on Apr 07, 2005 at 14:24 UTC

    Here's another approach with perl:

    perl -wsne '/$r/ && print' -- -r='your regexp here' *.txt

    the lowliest monk

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://445645]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2023-09-29 00:19 GMT
Find Nodes?
    Voting Booth?

    No recent polls found