Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Executing CGI/web form directives in regex substitution without pages of code

by Polyglot (Pilgrim)
on Feb 19, 2020 at 20:03 UTC ( #11113174=perlquestion: print w/replies, xml ) Need Help??

Polyglot has asked for the wisdom of the Perl Monks concerning the following question:

I have a database being edited via a web form which needs to allow user-specified substitutions. By providing checkboxes in the form for things like "Match at start of line", "Match at end of line", "Case insensitive", etc., along with the values for the actual text to search for and to replace with, I can provide PERL with a regular expression (regex/regexp) substitution to run against the text in the database. However, I have been unable to use it, and was forced to create pages of code to accommodate all of the possible options, as is shown further below.

What I would like is something more like this:

$substitute = qq|s~$sv$ch$ww$searchQuery$ww$ch$ev~$replacement~g$in;|; $line =~ $substitute;

. . . Where $sv would be a "^" if the user had requested the match to start at the beginning of the line, or be blank ("") otherwise, $ev would optionally be the "$" for end of line matching, etc. When I tried this approach, however, it failed. So I ended up doing the following instead:

sub processReplacements { my $regexM = shift @_; #TERM(S) TO MATCH my $regexR = shift @_; #REPLACEMENT TERM my $regexI = shift @_; #FLAG FOR CASE-INSENSITVE SUBSTITUTION my $sv = shift @_; # $sv => START, E.G. /^(.*)/; my $ev = shift @_; # $ev => END, E.G. /(.*)$/; my $ww = shift @_; # $ww => WHOLE-WORD, E.G. /\b(.*)\b/; my $ch = shift @_; # $ch => DELIMIT CHARS, # E.G. /[.,:;!?'"](.*)[.,:;!?'"]/; my @data = @_; # INCOMING ARRAY my @changed = (); # OUTGOING ARRAY my $line = ''; my $linehead = ''; my $sourceline = ''; $regexM = decode("utf8", $regexM); $regexR = decode("utf8", $regexR); foreach $line (@data) { chomp $line; $line =~ s/\s+$//; $line =~ s/((?:\d+\t)+)//; $linehead = $1; #KEEP A COPY OF ORIGINAL FOR LATER COMPARISON $sourceline = $line; if ($regexI) { #CASE INSENSITIVE if ($sv) { #START VERSE if ($ev) { #END VERSE if ($ww) { #WHOLE WORD if ($ch) { #CHAR DELIMITERS $line =~ s~^[$ch]$regexM[$ch]\b$~$regexR~gie; } else { #NO CHAR DELIMITS $line =~ s~^$regexM\b$~$regexR~gie; } } else { #NOT WHOLE if ($ch) { #CHAR DELIMITERS $line =~ s~^[$ch]$regexM[$ch]$~$regexR~gie; } else { #NO CHAR DELIMITS $line =~ s~^$regexM$~$regexR~gie; } } } else { #NOT END VERSE if ($ww) { #WHOLE WORD if ($ch) { #CHAR DELIMITERS $line =~ s~^\b[$ch]$regexM[$ch]\b~$regexR~gie; } else { #NO CHAR DELIMITS $line =~ s~^$regexM\b~$regexR~gie; } } else { #NOT WHOLE if ($ch) { #CHAR DELIMITERS $line =~ s~^[$ch]$regexM[$ch]~$regexR~gie; } else { #NO CHAR DELIMITS $line =~ s~^$regexM~$regexR~gie; } } } } else { #NOT START if ($ev) { #END VERSE if ($ww) { #WHOLE WORD if ($ch) { #CHAR DELIMITERS $line =~ s~\b[$ch]$regexM[$ch]\b$~$regexR~gie; } else { #NO CHAR DELIMITS $line =~ s~\b$regexM\b$~$regexR~gie; } } else { #NOT WHOLE if ($ch) { #CHAR DELIMITERS $line =~ s~[$ch]$regexM[$ch]$~$regexR~gie; } else { #NO CHAR DELIMITS $line =~ s~$regexM$~$regexR~gie; } } } else { #NOT END VERSE if ($ww) { #WHOLE WORD if ($ch) { #CHAR DELIMITERS $line =~ s~\b[$ch]$regexM[$ch]\b~$regexR~gie; } else { #NO CHAR DELIMITS $line =~ s~\b$regexM\b~$regexR~gie; } } else { #NOT WHOLE if ($ch) { #CHAR DELIMITERS $line =~ s~[$ch]$regexM[$ch]~$regexR~gie; } else { #NO CHAR DELIMITS $line =~ s~$regexM~$regexR~gie; } } } } } else { #CASE SENSITIVE if ($sv) { #START VERSE if ($ev) { #END VERSE if ($ww) { #WHOLE WORD if ($ch) { #CHAR DELIMITERS $line =~ s~^\b[$ch]$regexM[$ch]\b$~$regexR~ge; } else { #NO CHAR DELIMITS $line =~ s~^\b$regexM\b$~$regexR~ge; } } else { #NOT WHOLE if ($ch) { #CHAR DELIMITERS $line =~ s~^[$ch]$regexM[$ch]$~$regexR~ge; } else { #NO CHAR DELIMITS $line =~ s~^$regexM$~$regexR~ge; } } } else { #NOT END if ($ww) { #WHOLE WORD if ($ch) { #CHAR DELIMITERS $line =~ s~^\b[$ch]$regexM[$ch]\b~$regexR~ge; } else { #NO CHAR DELIMITS $line =~ s~^\b$regexM\b~$regexR~ge; } } else { #NOT WHOLE if ($ch) { #CHAR DELIMITERS $line =~ s~^[$ch]$regexM[$ch]~$regexR~ge; } else { #NO CHAR DELIMITS $line =~ s~^$regexM~$regexR~ge; } } } } else { #NOT START if ($ev) { #END VERSE if ($ww) { #WHOLE WORD if ($ch) { #CHAR DELIMITERS $line =~ s~\b[$ch]$regexM[$ch]\b$~$regexR~ge; } else { #NO CHAR DELIMITS $line =~ s~\b$regexM\b$~$regexR~ge; } } else { #NOT WHOLE if ($ch) { #CHAR DELIMITERS $line =~ s~[$ch]$regexM[$ch]$~$regexR~ge; } else { #NO CHAR DELIMITS $line =~ s~$regexM$~$regexR~ge; } } } else { #NOT END if ($ww) { #WHOLE WORD if ($ch) { #CHAR DELIMITERS $line =~ s~\b[$ch]$regexM[$ch]\b~$regexR~ge; } else { #NO CHAR DELIMITS $line =~ s~\b$regexM\b~$regexR~ge; } } else { #NOT WHOLE if ($ch) { #CHAR DELIMITERS $line =~ s~[$ch]$regexM[$ch]~$regexR~ge; } else { #NO CHAR DELIMITS $line =~ s~$regexM~$regexR~ge; } } } } } if ($line ne $sourceline) { push @changed, "$linehead$sourceline\t$line\n"; }; $line = $linehead.$line; } return @changed; } # END SUB processReplacements

To me, that doesn't seem very Perlish. It is certainly far more unwieldy to work with. Can this whole subroutine be replaced with just a few lines as per my first attempt?

Blessings,

~Polyglot~

Replies are listed 'Best First'.
Re: Executing CGI/web form directives in regex substitution without pages of code
by jwkrahn (Monsignor) on Feb 19, 2020 at 22:40 UTC

    Perhaps this would work better:

    sub processReplacements { my ( $regexM, # TERM(S) TO MATCH $regexR, # REPLACEMENT TERM $regexI, # FLAG FOR CASE-INSENSITVE SUBSTITUTION $sv, # $sv => START, E.G. /^(.*)/; $ev, # $ev => END, E.G. /(.*)$/; $ww, # $ww => WHOLE-WORD, E.G. /\b(.*)\b/; $ch, # $ch => DELIMIT CHARS, # E.G. /[.,:;!?'"](.*)[.,:;!?'"]/; @data ) = @_; # INCOMING ARRAY my @changed; # OUTGOING ARRAY my $linehead = ''; my $sourceline = ''; $regexM = decode( 'utf8', $regexM ); $regexR = decode( 'utf8', $regexR ); my $regex = '(?' . ( $regexI ? 'i' : '' ) . ':' . ( $sv ? '^' : $ww ? '\b' : '' ) . ( $ch ? "[$ch]" : '' ) . $regexM . ( $ch ? "[$ch]" : '' ) . ( $ev ? '$' : $ww ? '\b' : '' ) . ')'; foreach my $line ( @data ) { chomp $line; $line =~ s/\s+$//; $linehead = $1 if $line =~ s/((?:\d+\t)+)//; # KEEP A COPY OF ORIGINAL FOR LATER COMPARISON $sourceline = $line; $line =~ s/$regex/$regexR/ge; if ( $line ne $sourceline ) { push @changed, "$linehead$sourceline\t$line\n"; } $line = $linehead . $line; } return @changed; } # END SUB processReplacements

      Thank you! With one minor correction (a missed semicolon), this code works. I'd be happy to upvote it again if I could. You took the time to show me some better ways of doing things, such as the incoming variable assignments, that I had not seen before. (Sometimes I think I'm just an old plugger.)

      Thank you for taking your time to help me with this. Code like this will be much easier to adjust and maintain than what I had before.

      Blessings,

      ~Polyglot~

Re: Executing CGI/web form directives in regex substitution without pages of code
by 1nickt (Abbot) on Feb 19, 2020 at 20:12 UTC

    Hi, have you tried compiling your pattern with qr//?


    The way forward always starts with a minimal test.
      Hi, have you tried compiling your pattern with qr//?

      Or properly indenting? :P

        It is indented differently in my code. I was actually surprised after pasting and posting it here that it displayed as it does, and I assumed something had removed some of the tabs. In retrospect, it probably has more to do with the fact that my text editor uses a different (user-adjustable, and font-related) tab setting.

        Blessings,

        ~Polyglot~

      qr// does not appear to be designed to encapsulate the entire regex substitution expression, only the portion of it that would ordinarily be contained in the match section. I need something that will allow the substitution to be flagged for case-insensitive matching via the //i operator, in addition to the other portions (match / substitute) of the operation coming from user input (variable).

      For example, trying this did not work for me:

      $substitution = qr/s~$sv$ch$ww$regexM$ww$ch$ev~$regexR~g$regexI/; $line =~ $substitution;

      Blessings,

      ~Polyglot~

        I guess here you miss the possibility to have modifiers within the regular expression? A postponed /i modifier can also be written as (?i) within the regex - and in this fashion used in qr//.
Re: Executing CGI/web form directives in regex substitution without pages of code
by jo37 (Pilgrim) on Feb 19, 2020 at 21:46 UTC

    Something like this does work:

    my $m = $regexI ? qr{(?i:$regexM)} : qr($regexM); my $s = $sv ? qr(^) : qr(); my $e = $ev ? qr($) : qr(); my $b = $ww ? qr(\b) : qr(); my $c = $ch ? qr([.,:;!?'"]) : qr(); foreach my $line (@data) { push @changed, $line =~ s{$s$c$b\K$m(?=$b$c$e)}{$regexR}r; }

    HTH

    -jo

      I appreciate your effort to be helpful. However, that code is giving me a syntax error, and I'm not sure of the reason. The error message in the log says:

      syntax error at LBE_FindReplace.pl line 567, near "s{$s$c$b\\K$m(?=$b$ +c$e)}{$regexR}r"

      Blessings,

      ~Polyglot~

        Where does the double backslash in \\K come from? Here is a complete working example:

        #!/usr/bin/perl use strict; use warnings; sub processReplacements { my $regexM = shift @_; #TERM(S) TO MATCH my $regexR = shift @_; #REPLACEMENT TERM my $regexI = shift @_; #FLAG FOR CASE-INSENSITVE SUBSTITUTION my $sv = shift @_; # $sv => START, E.G. /^(.*)/; my $ev = shift @_; # $ev => END, E.G. /(.*)$/; my $ww = shift @_; # $ww => WHOLE-WORD, E.G. /\b(.*)\b/; my $ch = shift @_; # $ch => DELIMIT CHARS, # E.G. /[.,:;!?'"](.*)[.,:;!?'"]/; my @data = @_; # INCOMING ARRAY my @changed = (); # OUTGOING ARRAY my $line = ''; my $linehead = ''; my $sourceline = ''; my $m = $regexI ? qr{(?i:$regexM)} : qr($regexM); my $s = $sv ? qr(^) : qr(); my $e = $ev ? qr($) : qr(); my $b = $ww ? qr(\b) : qr(); my $c = $ch ? qr([.,:;!?'"]) : qr(); foreach my $line (@data) { push @changed, $line =~ s{$s$c$b\K$m(?=$b$c$e)}{$regexR}r; } return @changed; } foreach my $parm (qw(0:0:0:0:0 1:0:0:0:0 0:1:0:0:0 0:0:1:0:0 0:0:0:1:0 + 0:0:0:0:1)) { my @opts = split /:/, $parm; print "parm: $parm\n"; my @r = processReplacements('xxx', 'yyy', @opts, 'aaaxxxbbb', 'aaa xxx bbb', 'aaa:xxx:bbb', 'xxxbbb', 'aaa:XXX', 'a +aaxxx'); print "$_\n" foreach @r; print "\n"; }

        which gives:

        parm: 0:0:0:0:0 aaayyybbb aaa yyy bbb aaa:yyy:bbb yyybbb aaa:XXX aaayyy parm: 1:0:0:0:0 aaayyybbb aaa yyy bbb aaa:yyy:bbb yyybbb aaa:yyy aaayyy parm: 0:1:0:0:0 aaaxxxbbb aaa xxx bbb aaa:xxx:bbb yyybbb aaa:XXX aaaxxx parm: 0:0:1:0:0 aaaxxxbbb aaa xxx bbb aaa:xxx:bbb xxxbbb aaa:XXX aaayyy parm: 0:0:0:1:0 aaaxxxbbb aaa yyy bbb aaa:yyy:bbb xxxbbb aaa:XXX aaaxxx parm: 0:0:0:0:1 aaaxxxbbb aaa xxx bbb aaa:yyy:bbb xxxbbb aaa:XXX aaaxxx

        -jo

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11113174]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2020-09-21 03:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If at first I donít succeed, I Ö










    Results (124 votes). Check out past polls.

    Notices?