Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Generate a single "or regex" from given strings

by Anonymous Monk
on Sep 27, 2013 at 14:51 UTC ( #1055990=CUFP: print w/ replies, xml ) Need Help??

I got tired of remaking regex (of email addresses) for use by procmail and mutt by hand. Then I wrote the following. The worst case is every string is simply joined, which is still better than manual operation. Ideally one of three modules -- Regexp::(Assemble|Optimizer|Trie) -- will be used to generate the regex when available.

Mind that ...

  • /i flag is used; feel free to make it optional yourself;
  • duplicates in input are not considered;
  • end-spaces are removed.
#!perl use strict; use warnings; our $VERSION = '0.07'; # or-re.pl - Given a list of strings, prints a OR'd regex; prints tes +t results # for the input. # # End-spaces & duplicate strings are removed before generating the re +gex. scalar @ARGV or die qq[Give strings to make one "or" regex.\n]; my @list = prepare( @ARGV ); my $re = build_re( @list ); printf "Regex...\n %s\n\n" , $re; re_test( $re , @list ); exit; sub re_test { my ( $re , @list ) = @_; ref $re or $re = qr/$re/; print "Test ...\n"; for my $it ( @list ) { printf " %s %14s\n" , $it , ( $it =~ $re ? 'matches' : 'does NOT match' ) ; } return; } sub prepare { scalar @_ or return; my %seen; return map { s/^\s+//; s/\s+$//; !$seen{ $_ }++ ? $_ : () } @_; } sub build_re { scalar @_ or return ''; my @arg = @_; my %mod_map = ( 'Regexp::Assemble' => \&via_assemble , 'Regexp::Optimizer' => \&via_optimizer , 'Regexp::Trie' => \&via_trie , 'mine' => sub { return decorate( simpleton( @_ ) ) +} ); my @order = ( 'Regexp::Assemble' , 'Regexp::Optimizer' , 'Regexp::Trie' ); my $maker = 'mine'; for my $mod ( @order ) { _load_module( $mod ) or next; $maker = $mod; last; } return $mod_map{ $maker }->( @arg ) } sub decorate { return qq/\\b(?i:$_[0])\\b/ ; } sub simpleton { return join '|' , map { quotemeta( $_ ) } sort { length $b <=> length $a || lc $a cmp lc $b } @_ ; } sub via_assemble { my $maker = Regexp::Assemble->new( 'chomp' => 1 , 'reduce' => 1 , 'modifiers' => 'i' , 'anchor_word' => 1 ); $maker->add( map quotemeta( $_ ) , @_ ); # R:A::as_string() method eschews flags given. This preserves it at +the cost # of extraneous syntax elsewhere. return $maker->re() . ''; } sub via_optimizer { my $re = simpleton( @_ ); my $maker = Regexp::Optimizer->new(); return $maker->as_string( qr/\b(?:$re)\b/i ); } sub via_trie { my $maker = Regexp::Trie->new(); # Cannot add as list, unlike R::Assemble. $maker->add( $_ ) for @_; return $maker->regexp() . ''; } sub _load_module { my ( $mod ) = @_; local $@; eval qq/ require $mod; 1; /; $@ and do { warn "Could not load $mod: $@\n"; return; }; warn "# Using $mod ...\n"; return 1; }

Comment on Generate a single "or regex" from given strings
Select or Download Code
Replies are listed 'Best First'.
Re: Generate a single "or regex" from given strings
by LanX (Canon) on Sep 27, 2013 at 14:57 UTC
    (maybe TL;DR but as a side remark)

    Trie optimization is a builtin since Perl 5.10, maybe you should test that too.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: Generate a single "or regex" from given strings
by toolic (Bishop) on Sep 27, 2013 at 15:37 UTC
    To reduce the excessive verbosity when the modules can't be loaded, change:
    $@ and do { warn "Could not load $mod: $@\n"; return; };

    to:

    $@ and do { warn "Could not load $mod\n"; return; };

      Yes, that and other things may be optionally turned off|on if anyone else cares to implement.

Re: Generate a single "or regex" from given strings
by Anonymous Monk on Sep 27, 2013 at 22:13 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://1055990]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2015-07-08 00:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls