but I'm concerned about speed. If its doing this for ever file on a terabyte server I'm worried about the time consumption. What do you think?
Just the fact that you hide a loop as regexp alternatives doesn't mean it's suddenly orders of a magnitude faster.
In fact, it might as well be that splitting the regexp in
smaller chunks is faster, because the optimizer kicks in.
Here's a benchmark:
#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw /cmpthese/;
our @regexes = (
'.*\.jpg$',
'.*\.png$',
'Perl',
'\.mozilla/abigail',
);
our @words = `find /home/abigail`; # 38517 files.
our ($c1, $c2);
cmpthese -60 => {
single => 'my $regex = join "|" => @regexes;
$c1 = 0;
for my $w (@words) {
$c1 ++ if $w =~ /$regex/
}',
many => '$c2 = 0;
WORD:
for my $w (@words) {
for my $r (@regexes) {
$c2 ++, next WORD if $w =~ /$r/
}
}',
};
die "Unequal\n" unless $c1 == $c2;
__END__
s/iter single many
single 4.86 -- -74%
many 1.28 281% --
Now, for your particular data set results might be different. But don't assume alternatives are necessarely
slower.
Abigail
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|