Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Regexp substitution using variables

by Fletch (Chancellor)
on Nov 25, 2020 at 20:38 UTC ( #11124226=note: print w/replies, xml ) Need Help??


in reply to Regexp substitution using variables

I'm trying to think of some application where you'd reasonably need to accommodate random substitutions with possible /g modifiers but I'm coming up blank (but probably need more caffeine to boot . . .). I started to post something mentioning string eval (which as has been pointed out isn't the answer there either) but something about the original question has a not-too-faint whiff of "XY problem" about it.

Could you step back a hair more and explain why you think you need to run substitutions with arbitrary modifier flags? It may be that you don't actually and you could really get by with one of the prior suggestions (like moving compatible flags onto the front of the pattern). Or maybe you could work with some sort of (handwaving vigorously here) plugin / module system where you write substitution classes which implement a specific role that . . . /shrug

The cake is a lie.
The cake is a lie.
The cake is a lie.

Replies are listed 'Best First'.
Re^2: Regexp substitution using variables
by MikeTaylor (Acolyte) on Nov 25, 2020 at 22:29 UTC
    I understand your scepticism; this does indeed feel like one of those "How do I do X?" questions where the answer "Don't do X, do Y instead". (Is that what you meant by an "XY problem"? My situation is basically that I need to run a config file that specifies regular-expression substitutions. Specifically, my program is generating USMARC-format bibliographic records, and a config file says things like "in the 245$a field, replace /foo/ with 'bar' globally". In fact, the config looks like this:
    "245$a": [ { "op": "regsub", "from": "foo", "to": "bar", "flags": "g" } ]
    If you can think of a better way to do this, I am all ears but bear in mind I do need the full power of regexp substitutions, e.g. the ability to include parenthesized sub-expressions in the "from" part and $1 back-references in the "to" part.

      This is interesting. Can you provide some additional examples, including more esoteric ones, and possible a little sample text? I was just wanting to look at the challenges you're facing more pragmatically. Test cases would be fantastic.


      Dave

        I'm afraid I'm not yet far enough into the project to have solid examples, let alone test cases. I am waiting on the customer to let me know what specific transformations they need. But it would not be unlikely that we'd find, for example, a field containing call-numbers like PR.123.ABC that we needed to change to PR-ABC:123, which of course we could do with s/(.*)\.(.*)\.(.*)/$1-$3:$2/.
      "245$a": [ { "op": "regsub", "from": "foo", "to": "bar", "flags": "g" } ]

      This seems like a good starting point. See neilwatson's article How to ask better questions using Test::More and sample data for the way forward. Once you have a few working test cases defined, the only thing left is to define about a million more, including generous edge and corner cases and exception cases! No problem. :)


      Give a man a fish:  <%-{-{-{-<

      > ... e.g. the ability to include parenthesized sub-expressions in the "from" part and $1 back-references in the "to" part.

      Honestly .... store the full real regexp in your config and eval it (or eval it into a sub to optimize execution time)

      "245$a": [ { "regexp": 's/(foo|bar)/He said "$1"/' } ]

      There is no way to "safely" abstract the capture-var away, it has to be compiled into the regex and this needs an eval or /ee with all connected security issues.

      > but bear in mind I do need the full power of regexp substitutions,

      I have the impression your JSON format is an attempt to make it language agnostic. But the "full power" means you will be stuck with Perl.

      And full power means that security becomes an illusion.

      DB<111> $_="abc" DB<112> s/(.)/@{[print "what? --> $1\n"]}/g what? --> a what? --> b what? --> c DB<113>

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        Though ... there is one "lighter" version to build your replacement dynamically.

        eval the replacement-string into a sub, and apply just one /e at the s///

        DB<137> $rep = '<$1>' DB<138> eval qq( sub rep { "$rep" } ) DB<139> p "abc" =~ s/(.)/rep()/rge <a><b><c> DB<140>

        This will give you more control about what is happening, since you can use B::Deparse to check the replacement string before executing it.

        Like this you have at least a chance to reject dubious code.

        DB<140> p B::Deparse->new('-q')->coderef2text(\&rep) { use feature 'current_sub', 'evalbytes', 'fc', 'postderef_qq', 'say +', 'state', 'switch', 'unicode_strings', 'unico\ de_eval'; '<' . $1 . '>'; } DB<141>

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11124226]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2021-01-20 22:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?