Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Massive regexp search and replace

by holli (Monsignor)
on Feb 10, 2005 at 14:57 UTC ( #429751=note: print w/replies, xml ) Need Help??

in reply to Massive regexp search and replace

Using the following technique, you can encapsulate your regexes in anonymous subroutines, that can easily be called with the string to change as first argument. They return the changed string. Like this:
#list of regex-strings my @regex = ( 's/(a+)/\U$1/g', 's/([bz]+)/XX/g', ); #is now a list of subroutines @regex = map { eval "\$sub = sub { \$_=\$_[0]; $_; \$_ }" } @regex;
This list can easily be used like this:
my @text = ( "aaaabbzz", "bbbyyy", ); for my $t ( @text ) { print "org $t\n"; for my $re ( @regex ) { $t = &$re($t); } print "new $t\n"; }
Encapsulating the regexes in subroutines should be faster than recompiling the same regex again and again. Note, that I did no benchmarks.

holli, /regexed monk/

Replies are listed 'Best First'.
Re^2: Massive regexp search and replace
by Tanktalus (Canon) on Feb 10, 2005 at 15:17 UTC

    No doubt - advantage for compiling the regular expressions only once. But I'd take it just a tiny bit further - instead of all the copying around of the line:

    #list of regex-strings my @regex = ( 's/(a+)/\U$1/g', 's/([bz]+)/XX/g', ); #is now a list of subroutines @regex = map { eval "sub { $_ }" } @regex;
    Notes: got rid of the copying of the line in and out, we'll just work on the global $_; also got rid of the extraneous assignment to the global $sub variable. Now you use it like:
    my @text = ( "aaaabbzz", "bbbyyy", ); for ( @text ) { print "org $_\n"; for my $re ( @regex ) { &$re(); # or even just &$re } print "new $_\n"; }
    The advantage here is when you have many regex's (which the OP said they would) - less copying of data around. It's just a tiny bit more dangerous since so many functions modify $_, though.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://429751]
[Corion]: Mhmm. I'm writing a database export, and two supposedly identical files compress to different sizes... So either the order of rows is different (which would be OK) or something else is bad (which wouldn't be OK). I guess I have to test with smaller tables

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2018-04-25 07:29 GMT
Find Nodes?
    Voting Booth?