Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

grep for lines containg two variables

by smoss74 (Acolyte)
on Dec 07, 2005 at 16:26 UTC ( #514884=perlquestion: print w/ replies, xml ) Need Help??
smoss74 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I'm trying without success to grep for two variables in a string.
I know the syntax for "or"

@interesting_lines = grep (/$string1|$string2/, @log);

but not "and" i.e. match on a line containg $string1 and $string2.
Could someone please help!

Thanks.

Comment on grep for lines containg two variables
Re: grep for lines containg two variables
by ikegami (Pope) on Dec 07, 2005 at 16:29 UTC
    @interesting_lines = grep /$string1/, grep /$string2/, @log;

    or

    @interesting_lines = grep /$string1/ && /$string2/, @log;

    or

    @interesting_lines = grep /^(?=.*$string1)(?=.*$string2)/, @log;

    OT Note: If your strings contain text, not regex, be sure to escape them using quotemeta or /\Q$string\E/. Better yet, use index instead of regexs in that case since it's much faster.

      seems that lookahead grep is the slightly faster solution ... here is the test script I used:
      #!/usr/bin/perl -w # usage : ./this_script.pl < input_file > captured_benchmarks use strict; use Benchmark; my @data=<>; my (@res1,@res2,@res3); timethese (100000000, { grep_and => q{ @res1 = grep /GGGGGACACCTTCTCTCTCT/ && /RH_MEa0001bG06/, @data; }, double_grep => q{ @res2 = grep /GGGGGACACCTTCTCTCTCT/,grep /RH_MEa0001bG06/,@data; }, lookahead_grep => q{ @res3 = grep /^(?=.*GGGGGACACCTTCTCTCTCT)(?=.*RH_MEa0001bG06)/,@da +ta; } } );
      ... and the results
      Benchmark: timing 100000000 iterations of double_grep, grep_and, looka +head_grep... double_grep : 27 wallclock secs (26.98 usr + 0.00 sys = 26.98 CPU) @ 3705899.79/s ( +n=100000000) grep_and : 24 wallclock secs (23.05 usr + 0.00 sys = 23.05 CPU) @ 4338959.52/s ( +n=100000000) lookahead_grep : 24 wallclock secs (22.83 usr + 0.00 sys = 22.83 CPU) @ 4380585.25/s ( +n=100000000)

        I wish I had a computer that executed 4380585.25 greps per second...

        Your test is useless. The @data, $string1 and $string2 used by the test are always undef. I fixed it up below. Note the use of sub { ... } instead of q{ ... }. Subs capture over my variables, while the string is evaled in a different scope where the my varibles don't exist.

        #!/usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); my $string1 = qr/tr/; my $string2 = qr/e/; my @data = do { open(my $fh, '<', $0) or die; <$fh> }; cmpthese (-3, { grep_and => sub { my @r = grep /$string1/ && /$string2/, @data; return @r; }, double_grep => sub { my @r = grep /$string1/, grep /$string2/, @data; return @r; }, lookahead => sub { my @r = grep /^(?=.*$string1)(?=.*$string2)/, @data; return @r; } });
        outputs
        Rate lookahead double_grep grep_and lookahead 8114/s -- -52% -62% double_grep 16986/s 109% -- -21% grep_and 21483/s 165% 26% --

        The contents of @data are probably not all that good, so the figures aren't perfect, but they give a pretty good idea.

      Just a couple of queries...

      What if one string is contained in the other but should not be counted?
      Ok so add white space checks/breaks to the search strings.

      Do you need to lookahead on both strings? This seems to work.
      @interesting_lines = grep /(?=.*$string1).*$string2/,@log;
        What if one string is contained in the other but should not be counted?

        That's tricky. Do you really want that? I don't have the time right now to spend the effort on a what-if you won't use.

        Do you need to lookahead on both strings?

        No, it's optional on the last one. If you had 4 strings, you'd need the lookahead on the first three, but it would be optional on the fourth.

        What if one string is contained in the other but should not be counted?

        How about

        grep /$string1.*$string2|$string2.*$string1/,
Re: grep for lines containg two variables
by gjb (Vicar) on Dec 07, 2005 at 16:41 UTC

    @interesting_lines = grep {/$string1/ xor /$string2/} @log;
    will do nicely.

    Hope this helps, -gjb-

    Update: apparently I got the question wrong, see the previous answer or for an alternative syntax:

    @interesting_lines = grep {/$string1/ and /$string2/} @log;

      I'm glad someone else made this mistake, It must be something about how the question is worded because this is also what I thought the OP was after. It's completely wrong though.

      ikegami got it right.

      ++ for effort anyway :)

      ---
      my name's not Keith, and I'm not reasonable.

      When the OP said
      I know the syntax for "or" (code example) but not "and"
      I think he meant
      I know the syntax for "or" (code example) but I don't know the syntax for "and"
      rather than
      I don't know the syntax for "or but not and"

        Anyone else reminded of the Friends episode.

        They don't know that we know they know we know! (Joey just shakes his head. ...

        ---
        my name's not Keith, and I'm not reasonable.
Re: grep for lines containg two variables
by pileofrogs (Priest) on Dec 07, 2005 at 18:56 UTC

    Does one string always preceed the other string? Or can they be in any order?

    If I've got my head screwed on correctly (always a doubtful proposition,) some of the solutions above will work in both cases and other solutions will only work if $string1 appears before $string2.

    From ikegami's post:

    Works in any order

    @interesting_lines = grep /$string1/, grep /$string2/, @log;

    Works in any order

    @interesting_lines = grep /$string1/ && /$string2/, @log;

    Only works if $string1 preceeds $string2

    @interesting_lines = grep /^(?=.*$string1)(?=.*$string2)/, @log;

    Or am I a total moron?

      Because it's done with lookaheads, the last example works with strings in either order. That is, in fact, why it is done with lookaheads. In my mind, lookaheads like this are the "and" that complements alternation's "or" in regexen.

      Caution: Contents may have been coded under pressure.
        Woah! Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://514884]
Approved by ikegami
Front-paged by Roy Johnson
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2014-10-26 09:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (153 votes), past polls