Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Returning regexp pattern that was used to match

by crabbdean (Pilgrim)
on May 03, 2004 at 14:18 UTC ( #350016=perlquestion: print w/ replies, xml ) Need Help??
crabbdean has asked for the wisdom of the Perl Monks concerning the following question:

Personally I get sick of all the regexp questions on here but this is one I don't ever seeing before or heard of being done.

I call a regexp with a hash:
$f->do_this({ '.*\.doc$' => [qw/delete_me 1/], '.*\.xls$' => [qw/copy_to C:\test/], 'dean' => [qw/return_this rel_file|stat_8/], 'F:/dean/ => ['mirror_to', 'C:\test'] });
... which is compiles into a search string in the object using the method ...
sub do_this { my ($self, $args) = @_; my %hash = %{$args}; while (my ($d, $r) = each (%hash)) { $self->{actions}{$d} = $r; #print "D: $d\nR: @$r\n"; } my $string = undef; map {$string .= $_ . "|"} keys(%{$args}); $string =~ s/\|$//; ##remove the last pipe symbol $string = qr/$string/i; $self->{find} = $string; ## this is the string to match files a +gainst #print $self->{find}; }
.... now when I do a pattern match against it (which by the way works perfectly!) ... I then have a problem ... I know $& returns the matched string ... but I need what PATTERN WAS USED to match against the string. How?

This is the code where I do the matching and want to return the PATTERN USED.
if (/$self->{find}/) { my $this = $&; my $matched = ## find pattern used here process($self, $_, $dir, $matched); }
Reasons is I then need to process the original hash (shown at the top of this) using the PATTERN USED as the key to the hash for what entry was found. How can I do this?

Dean
The Funkster of Mirth
Programming these days takes more than a lone avenger with a compiler. - sam
RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers

Comment on Returning regexp pattern that was used to match
Select or Download Code
Re: Returning regexp pattern that was used to match
by Abigail-II (Bishop) on May 03, 2004 at 14:44 UTC
    You only have one pattern:
    join "|" => keys %$args;
    That's the only pattern you use, so the answer is trivial. (Now, you probably want to know which of the clauses matched. But that's not an appropriate question. It takes a bit too much time to explain why the question isn't appropriate.)

    Abigail

      Well I need to know what clause matched, unless there is some other way to do this?

      UPDATE: My other thought was to do this (which also works and gives me what I want):
      for ( keys %{$self->{actions}} ) { process($self, $file, $dir, $_) if ($fullfile =~ /$_/) +; next; }
      .. but I'm concerned about speed. If its doing this for ever file on a terabyte server I'm worried about the time consumption. What do you think?

      Dean
      The Funkster of Mirth
      Programming these days takes more than a lone avenger with a compiler. - sam
      RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers
        Well, you could of course always not construct a big regexp, but just loop over the keys and apply each regexp, doing something when it matches.

        Abigail

        If you are concerned about speed than contrive to have precompiled those patterns before testing them. I had you directly testing against the keys in the pattern. Another idea might be to do this.

        use vars qw' %CACHED_RX '; sub do_this { my $self = shift; my %rx = %{ shift() }; for ( keys %rx ) { my $rx = $CACHED_RX{$_} ||= qr/$_/; if ( $self->{'find'} =~ $rx ) { } } }
        but I'm concerned about speed. If its doing this for ever file on a terabyte server I'm worried about the time consumption. What do you think?
        Just the fact that you hide a loop as regexp alternatives doesn't mean it's suddenly orders of a magnitude faster. In fact, it might as well be that splitting the regexp in smaller chunks is faster, because the optimizer kicks in.

        Here's a benchmark:

        #!/usr/bin/perl use strict; use warnings; use Benchmark qw /cmpthese/; our @regexes = ( '.*\.jpg$', '.*\.png$', 'Perl', '\.mozilla/abigail', ); our @words = `find /home/abigail`; # 38517 files. our ($c1, $c2); cmpthese -60 => { single => 'my $regex = join "|" => @regexes; $c1 = 0; for my $w (@words) { $c1 ++ if $w =~ /$regex/ }', many => '$c2 = 0; WORD: for my $w (@words) { for my $r (@regexes) { $c2 ++, next WORD if $w =~ /$r/ } }', }; die "Unequal\n" unless $c1 == $c2; __END__ s/iter single many single 4.86 -- -74% many 1.28 281% --
        Now, for your particular data set results might be different. But don't assume alternatives are necessarely slower.

        Abigail

Re: Returning regexp pattern that was used to match
by Ven'Tatsu (Deacon) on May 03, 2004 at 14:58 UTC
    First off a few comments not related to your question:
    If your using any other regexs in your program avoid $& wrap your test in a () instead. $& will cause all regexs to run slower through your whole program.
    Don't use map in a void context, it throws away the output. Try for (keys(${$args})) {$string .= $_ . "|"} I would usualy consider join even better but it would not work for my idea bellow.

    As for you question you could try (?{ CODE }) to set a variable to the matched code, maybe something like this:
    our $matched_key; #global to hold the key that matched, off the top of + my head I don't think a 'my' variable would work. for (keys($$args)) { $string .= "$_(?{$matched_key = $_})|"; }
      Don't use map in a void context, it throws away the output.
      What output does it throw away? What about using assignment or print in void context? Not using print in void context because some "output" is thrown away?

      Abigail

        Each $string .= ... returns the new contents of $string. map builds an array with the values of last statment execuded in it's block. This cost both time and memory.
        foreach does not keep the values it's block returns so it will run faster than map.
        The OP's code did not use any of the results a concatination assignment (other than the actual concationation assignment side effect) so keeping the results around is a waste.
        In the case of assignment or print your usualy not interested in the direct result, only the side effect. So storing the result is a waste of time and memory.

        See What's wrong with using grep or map in a void context?

        This was changed in 5.8.1 (map only, grep should still be avoided in void context), but as there are plenty of installs running perls prior to that it's still a bad idea.
        (?{ CODE }) will execute any perl code when the regex engine tries to match it. So for example:
        my $string = "Bar"; $sting =~ /Foo(?{ print "Found a Foo\n" })|Bar(?{ print "Found a Bar\n +" })|Baz(?{ print "Found a Baz\n" })/;
        Should print out 'Found a Bar'.

        It's documented in the perlre man page under 'Extended Patterns'.
Re: Returning regexp pattern that was used to match
by diotalevi (Canon) on May 03, 2004 at 15:05 UTC

    This is easy enough - just test each portion individually. When it matches then you know that the current pattern is the one that matched.

    sub do_this { my $self = shift; my %actions = %{shift()}; for my $rx ( keys %actions ) { if ( $self->{'find'} =~ $rx ) { print "Found it with '$rx'\n"; last; } } }
Re: Returning regexp pattern that was used to match
by Abigail-II (Bishop) on May 03, 2004 at 15:17 UTC
    You have a different problem in your code. Files can match multiple times, in fact, any file that matches 'F:/dean/' will also match 'dean'. But since you let the other of the clauses be determined by keys(), it might be that in one run of your program a file that matches 'F:/dean/' will be mirrorred to C:\test, and in another run 'rel_file|stat_8' will be returned.

    Abigail

      Yup, I'm aware of that. That will be in the documentation and come down to user discretion of use. <grinning> The features will be build in however. It would be recommended to the user to create different objects or even seperate programs to do different tasks and not to mix mirroring with deletions, sync'ing with copying etc. But it will all be bundled in the same module and interface.

      Dean
      The Funkster of Mirth
      Programming these days takes more than a lone avenger with a compiler. - sam
      RFC1149: A Standard for the Transmission of IP Datagrams on Avian Carriers

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://350016]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2014-08-01 03:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (256 votes), past polls