Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Returning regexp pattern that was used to match

by Abigail-II (Bishop)
on May 03, 2004 at 14:44 UTC ( #350024=note: print w/ replies, xml ) Need Help??


in reply to Returning regexp pattern that was used to match

You only have one pattern:

join "|" => keys %$args;
That's the only pattern you use, so the answer is trivial. (Now, you probably want to know which of the clauses matched. But that's not an appropriate question. It takes a bit too much time to explain why the question isn't appropriate.)

Abigail


Comment on Re: Returning regexp pattern that was used to match
Download Code
Re: Re: Returning regexp pattern that was used to match
by crabbdean (Pilgrim) on May 03, 2004 at 14:52 UTC
    Well I need to know what clause matched, unless there is some other way to do this?

    UPDATE: My other thought was to do this (which also works and gives me what I want):
    for ( keys %{$self->{actions}} ) { process($self, $file, $dir, $_) if ($fullfile =~ /$_/) +; next; }
    .. but I'm concerned about speed. If its doing this for ever file on a terabyte server I'm worried about the time consumption. What do you think?

    Dean
    The Funkster of Mirth
    Programming these days takes more than a lone avenger with a compiler. - sam
    RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers
      Well, you could of course always not construct a big regexp, but just loop over the keys and apply each regexp, doing something when it matches.

      Abigail

      If you are concerned about speed than contrive to have precompiled those patterns before testing them. I had you directly testing against the keys in the pattern. Another idea might be to do this.

      use vars qw' %CACHED_RX '; sub do_this { my $self = shift; my %rx = %{ shift() }; for ( keys %rx ) { my $rx = $CACHED_RX{$_} ||= qr/$_/; if ( $self->{'find'} =~ $rx ) { } } }
      but I'm concerned about speed. If its doing this for ever file on a terabyte server I'm worried about the time consumption. What do you think?
      Just the fact that you hide a loop as regexp alternatives doesn't mean it's suddenly orders of a magnitude faster. In fact, it might as well be that splitting the regexp in smaller chunks is faster, because the optimizer kicks in.

      Here's a benchmark:

      #!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; our @regexes = ( '.*\.jpg$', '.*\.png$', 'Perl', '\.mozilla/abigail', ); our @words = `find /home/abigail`; # 38517 files. our ($c1, $c2); cmpthese -60 => { single => 'my $regex = join "|" => @regexes; $c1 = 0; for my $w (@words) { $c1 ++ if $w =~ /$regex/ }', many => '$c2 = 0; WORD: for my $w (@words) { for my $r (@regexes) { $c2 ++, next WORD if $w =~ /$r/ } }', }; die "Unequal\n" unless $c1 == $c2; __END__ s/iter single many single 4.86 -- -74% many 1.28 281% --
      Now, for your particular data set results might be different. But don't assume alternatives are necessarely slower.

      Abigail

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://350024]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2014-09-16 19:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (45 votes), past polls