Re: grep for lines containg two variables
by ikegami (Patriarch) on Dec 07, 2005 at 16:29 UTC
|
@interesting_lines = grep /$string1/,
grep /$string2/,
@log;
or
@interesting_lines = grep /$string1/ && /$string2/, @log;
or
@interesting_lines = grep /^(?=.*$string1)(?=.*$string2)/, @log;
OT Note: If your strings contain text, not regex, be sure to escape them using quotemeta or /\Q$string\E/. Better yet, use index instead of regexs in that case since it's much faster.
| [reply] [d/l] [select] |
|
seems that lookahead grep is the slightly faster solution ...
here is the test script I used:
#!/usr/bin/perl -w
# usage : ./this_script.pl < input_file > captured_benchmarks
use strict;
use Benchmark;
my @data=<>;
my (@res1,@res2,@res3);
timethese (100000000,
{ grep_and => q{
@res1 = grep /GGGGGACACCTTCTCTCTCT/ && /RH_MEa0001bG06/, @data;
},
double_grep => q{
@res2 = grep /GGGGGACACCTTCTCTCTCT/,grep /RH_MEa0001bG06/,@data;
},
lookahead_grep => q{
@res3 = grep /^(?=.*GGGGGACACCTTCTCTCTCT)(?=.*RH_MEa0001bG06)/,@da
+ta;
}
}
);
... and the results
Benchmark: timing 100000000 iterations of double_grep, grep_and, looka
+head_grep...
double_grep :
27 wallclock secs (26.98 usr + 0.00 sys = 26.98 CPU) @ 3705899.79/s (
+n=100000000)
grep_and :
24 wallclock secs (23.05 usr + 0.00 sys = 23.05 CPU) @ 4338959.52/s (
+n=100000000)
lookahead_grep :
24 wallclock secs (22.83 usr + 0.00 sys = 22.83 CPU) @ 4380585.25/s (
+n=100000000)
| [reply] [d/l] [select] |
|
I wish I had a computer that executed 4380585.25 greps per second...
Your test is useless. The @data, $string1 and $string2 used by the test are always undef. I fixed it up below. Note the use of sub { ... } instead of q{ ... }. Subs capture over my variables, while the string is evaled in a different scope where the my varibles don't exist.
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw( cmpthese );
my $string1 = qr/tr/;
my $string2 = qr/e/;
my @data = do {
open(my $fh, '<', $0) or die;
<$fh>
};
cmpthese (-3, {
grep_and => sub {
my @r = grep /$string1/ && /$string2/, @data;
return @r;
},
double_grep => sub {
my @r = grep /$string1/, grep /$string2/, @data;
return @r;
},
lookahead => sub {
my @r = grep /^(?=.*$string1)(?=.*$string2)/, @data;
return @r;
}
});
outputs
Rate lookahead double_grep grep_and
lookahead 8114/s -- -52% -62%
double_grep 16986/s 109% -- -21%
grep_and 21483/s 165% 26% --
The contents of @data are probably not all that good, so the figures aren't perfect, but they give a pretty good idea.
| [reply] [d/l] [select] |
|
|
|
|
Just a couple of queries...
What if one string is contained in the other but should not be counted?
Ok so add white space checks/breaks to the search strings.
Do you need to lookahead on both strings? This seems to work.
@interesting_lines = grep /(?=.*$string1).*$string2/,@log;
| [reply] [d/l] |
|
What if one string is contained in the other but should not be counted?
That's tricky. Do you really want that? I don't have the time right now to spend the effort on a what-if you won't use.
Do you need to lookahead on both strings?
No, it's optional on the last one. If you had 4 strings, you'd need the lookahead on the first three, but it would be optional on the fourth.
| [reply] |
|
|
What if one string is contained in the other but should not be counted?
How about
grep /$string1.*$string2|$string2.*$string1/,
| [reply] [d/l] [select] |
Re: grep for lines containg two variables
by gjb (Vicar) on Dec 07, 2005 at 16:41 UTC
|
@interesting_lines = grep {/$string1/ xor /$string2/} @log;
will do nicely.
Hope this helps, -gjb-
Update: apparently I got the question wrong, see the previous answer or for an alternative syntax:
@interesting_lines = grep {/$string1/ and /$string2/} @log;
| [reply] [d/l] [select] |
|
| [reply] |
|
| [reply] |
|
| [reply] |
Re: grep for lines containg two variables
by pileofrogs (Priest) on Dec 07, 2005 at 18:56 UTC
|
Does one string always preceed the other string? Or can they be in any order?
If I've got my head screwed on correctly (always a doubtful proposition,) some of the solutions above will work in both cases and other solutions will only work if $string1 appears before $string2.
From ikegami's post:
Works in any order
@interesting_lines = grep /$string1/, grep /$string2/, @log;
Works in any order
@interesting_lines = grep /$string1/ && /$string2/, @log;
Only works if $string1 preceeds $string2
@interesting_lines = grep /^(?=.*$string1)(?=.*$string2)/, @log;
Or am I a total moron? | [reply] [d/l] [select] |
|
| [reply] |
|
| [reply] |