I also tried a few different things and it seems that using or not /x, /m or /s depends entirely on whether you want them, the speed difference is negligible.
I was amazed how much speed difference there was in /i! The amount of slowdown from using it instead of turning the entire string lowercase was amazing.
What made the most difference was not repeating myself, reading the entire header at once and also matching the regex against the header just once. Grabbing only the one match out of the headers (no /g) also sped things up quite a bit and although it may miss something, in this case not very likely, Don't do more than you have to. I will use this in the future and just that made this entire thread worth it to me.
As far as setting /xms by default, it makes perlcritic happy (and critic complains if a regex doesn't have them, nothing except reading the code tells me that they do) and I have had as much strangeness when I expected whitespace not to match but forgot to set /s, so errors can happen either way.
One thing I was unsure of, when you talk of "anchoring" your examples seem like that means using ^ or $ when I would think any static string (like "andrew\+") would count as an anchor in a regex?
I also included just reading the headers in both ways to show how much of the time was taken by file access and it was significant.
Rate Case_Ins No_X Old No_M No_S No_XMS No_Strip 1_Match 1_Read 1_Read_1_Match Just_Read Just_1_Read
Case_Ins 23.2/s -- -44% -45% -45% -45% -45% -46% -51% -55% -58% -65% -69%
No_X 41.7/s 80% -- -1% -1% -1% -1% -2% -11% -19% -25% -37% -44%
Old 42.0/s 81% 1% -- -0% -0% -0% -2% -11% -18% -24% -37% -43%
No_M 42.0/s 81% 1% 0% -- -0% -0% -2% -11% -18% -24% -37% -43%
No_S 42.2/s 82% 1% 0% 0% -- -0% -1% -10% -18% -24% -37% -43%
No_XMS 42.2/s 82% 1% 0% 0% 0% -- -1% -10% -18% -24% -37% -43%
No_Strip 42.7/s 84% 3% 2% 2% 1% 1% -- -9% -17% -23% -36% -42%
1_Match 46.9/s 102% 13% 12% 12% 11% 11% 10% -- -8% -15% -30% -37%
1_Read 51.3/s 121% 23% 22% 22% 22% 22% 20% 9% -- -7% -23% -31%
1_Read_1_Match 55.2/s 138% 33% 31% 31% 31% 31% 29% 18% 8% -- -17% -25%
Just_Read 66.7/s 187% 60% 59% 59% 58% 58% 56% 42% 30% 21% -- -10%
Just_1_Read 74.1/s 219% 78% 76% 76% 76% 76% 73% 58% 44% 34% 11% --
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use Benchmark qw/ cmpthese /;
my %ad;
my @dirs = glob( $ENV{HOME} . '/Maildir/.misc.d*' );
cmpthese(
100,
{ Old => sub { %ad = (); find( \&old, @dirs )
+; },
Case_Ins => sub { %ad = (); find( \&case_insensitive, @dirs )
+ },
'1_Read' => sub { %ad = (); find( \&one_read, @dirs )
+ },
'1_Match' => sub { %ad = (); find( \&one_match, @dirs )
+ },
'1_Read_1_Match' =>
sub { %ad = (); find( \&one_read_one_match, @dirs ) },
No_XMS => sub { %ad = (); find( \&no_xms, @dirs )
+},
No_X => sub { %ad = (); find( \&no_x, @dirs )
+},
No_M => sub { %ad = (); find( \&no_m, @dirs )
+},
No_S => sub { %ad = (); find( \&no_s, @dirs )
+},
No_Strip => sub { %ad = (); find( \&no_strip, @dirs )
+},
Just_Read => sub { %ad = (); find( \&just_read, @dirs )
+},
Just_1_Read => sub { %ad = (); find( \&just_one_read, @dirs )
+},
}
);
sub old {
return unless -f $_;
open my $fh, '<', $_ or die;
while (<$fh>) {
last if $_ eq "\n";
$_ = lc $_;
if (/\b(andrew\+[^\@\s]+\@[-a-z0-9\.]+)\b/xms) {
my $addr = $1;
if ( $addr =~ s/[=\#\%](?:3d)?/@/xms ) {
$addr =~ s/\@[^@]+$//xms;
}
$ad{$addr}++;
}
}
close $fh;
}
sub no_xms {
return unless -f $_;
open my $fh, '<', $_ or die;
while (<$fh>) {
last if $_ eq "\n";
$_ = lc $_;
if (/\b(andrew\+[^\@\s]+\@[-a-z0-9\.]+)\b/) {
my $addr = $1;
if ( $addr =~ s/[=\#\%](?:3d)?/@/ ) {
$addr =~ s/\@[^@]+$//;
}
$ad{$addr}++;
}
}
close $fh;
}
sub no_s {
return unless -f $_;
open my $fh, '<', $_ or die;
while (<$fh>) {
last if $_ eq "\n";
$_ = lc $_;
if (/\b(andrew\+[^\@\s]+\@[-a-z0-9\.]+)\b/xm) {
my $addr = $1;
if ( $addr =~ s/[=\#\%](?:3d)?/@/xm ) {
$addr =~ s/\@[^@]+$//xm;
}
$ad{$addr}++;
}
}
close $fh;
}
sub no_m {
return unless -f $_;
open my $fh, '<', $_ or die;
while (<$fh>) {
last if $_ eq "\n";
$_ = lc $_;
if (/\b(andrew\+[^\@\s]+\@[-a-z0-9\.]+)\b/xs) {
my $addr = $1;
if ( $addr =~ s/[=\#\%](?:3d)?/@/xs ) {
$addr =~ s/\@[^@]+$//xs;
}
$ad{$addr}++;
}
}
close $fh;
}
sub no_x {
return unless -f $_;
open my $fh, '<', $_ or die;
while (<$fh>) {
last if $_ eq "\n";
$_ = lc $_;
if (/\b(andrew\+[^\@\s]+\@[-a-z0-9\.]+)\b/ms) {
my $addr = $1;
if ( $addr =~ s/[=\#\%](?:3d)?/@/ms ) {
$addr =~ s/\@[^@]+$//ms;
}
$ad{$addr}++;
}
}
close $fh;
}
sub no_strip {
return unless -f $_;
open my $fh, '<', $_ or die;
while (<$fh>) {
last if $_ eq "\n";
$_ = lc $_;
if (/\b(andrew\+[^\@\s]+\@[-a-z0-9\.]+)\b/xms) {
$ad{$1}++;
}
}
close $fh;
}
sub case_insensitive {
return unless -f $_;
open my $fh, '<', $_ or die;
while (<$fh>) {
last if $_ eq "\n";
if (/\b(andrew\+[^\@\s]+\@[-a-z0-9\.]+)\b/ixms) {
my $addr = $1;
if ( $addr =~ s/[=\#\%](?:3[Dd])?/@/xms ) {
$addr =~ s/\@[^@]+$//xms;
}
$ad{$addr}++;
}
}
close $fh;
}
sub one_read {
return unless -f $_;
open my $fh, '<', $_ or die;
local $/ = '';
$_ = lc <$fh>;
close $fh;
foreach my $addr (/\b(andrew\+[^\@\s]+\@[-a-z0-9\.]+)\b/gxms) {
if ( $addr =~ s/[=\#\%](?:3d)?/@/xms ) {
$addr =~ s/\@[^@]+$//xms;
}
$ad{$addr}++;
}
}
sub one_match {
return unless -f $_;
my $headers = '';
open my $fh, '<', $_ or die;
while (<$fh>) {
last if $_ eq "\n";
$headers .= lc $_;
}
close $fh;
if ( $headers =~ /\b(andrew\+[^\@\s]+\@[-a-z0-9\.]+)\b/xms ) {
my $addr = $1;
if ( $addr =~ s/[=\#\%](?:3d)?/@/xms ) {
$addr =~ s/\@[^@]+$//xms;
}
$ad{$addr}++;
}
}
sub one_read_one_match {
return unless -f $_;
open my $fh, '<', $_ or die;
local $/ = '';
$_ = lc <$fh>;
close $fh;
if ( my ($addr) = /\b(andrew\+[^\@\s]+\@[-a-z0-9\.]+)\b/xms ) {
if ( $addr =~ s/[=\#\%](?:3d)?/@/xms ) {
$addr =~ s/\@[^@]+$//xms;
}
$ad{$addr}++;
}
}
sub just_read {
return unless -f $_;
open my $fh, '<', $_ or die;
while (<$fh>) {
last if $_ eq "\n";
}
close $fh;
}
sub just_one_read {
return unless -f $_;
open my $fh, '<', $_ or die;
local $/ = '';
$_ = <$fh>;
close $fh;
}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.