Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Re: RegExp: pos management in global substitution

by AnomalousMonk (Chancellor)
on Sep 03, 2011 at 15:03 UTC ( #923998=note: print w/replies, xml ) Need Help??

in reply to RegExp: pos management in global substitution

I agree that a two-step approach (i.e., match for name_a, then delete unwanted params) is best. However, here's an 'advanced' (read: unnecessarily complex) way that doesn't use the  /e modifier:

>perl -wMstrict -le "my @txt = ( '{name_b param_v=\"wh\"}', '{name_a param_x=\"abc\" param_a=\"fsd\" param_y=\"def\"}', '{name_z param_sd=\"zka\" param_s=\"df\"}', '{name_a param_y=\"wtf\" param_z=\"kro\" param_c=\"ptz\" param_ch=\ +"www\"}', '{name_a param_sd=\"zka\" param_y=\"wtf\"}', ); ;; my $not_p_xy = qr{ (?! param_ [xy] \b) }xms; my $not_sp_param = qr{ (?! \s+ $not_p_xy param_ \w+) . }xms; ;; for my $s (@txt) { print qq{'$s'}; $s =~ s( (?: \G (?<! \A) | \A {name_a) $not_sp_param* \K \s+ $not_p_xy \w+ = \" [^^\"]* \" ) ()xmsg; print qq{'$s' \n}; } " '{name_b param_v="wh"}' '{name_b param_v="wh"}' '{name_a param_x="abc" param_a="fsd" param_y="def"}' '{name_a param_x="abc" param_y="def"}' '{name_z param_sd="zka" param_s="df"}' '{name_z param_sd="zka" param_s="df"}' '{name_a param_y="wtf" param_z="kro" param_c="ptz" param_ch="www"}' '{name_a param_y="wtf"}' '{name_a param_sd="zka" param_y="wtf"}' '{name_a param_y="wtf"}'


  1. Removed extraneous  (?: ) grouping around  \K expression above. Example output unchanged.
  2. I agree with Re^2: RegExp: pos management in global substitution regarding efficiency, and have further simplified (IMO) to:
    sub xform { my ($string, ) = @_; my $xy = qr{ [xy] }xmso; my $val = qr{ = ' [^']* ' }xmso; my $param_ = qr{ \s+ param_ }xmso; my $param_xy = qr{ $param_ $xy $val }xmso; my $param_any = qr{ $param_ \w+ $val }xmso; $string =~ s{ (?: \G (?<! ^) | ^ \{ name_a) $param_xy*+ \K $param_any } ''xmsg; return $string; }
    Update: Note that I'm using ' (single-quote) instead of " (double-quote) as the parameter value delimiter in my code and testing. This uses 5.10+ regex features. This has withstood everything I have thrown at it (including the multi-line example of the OP!), and I think it's my final answer. (This even qualifies as perhaps not unnecessarily complex!)

Replies are listed 'Best First'.
Re^2: RegExp: pos management in global substitution
by OlegG (Monk) on Sep 03, 2011 at 17:55 UTC
    Interesting, but it depends of new line. In my example each tag separated by new line, but it is special case.
    And it seems your regexp could be more efficient:
    s( (?: (?: \G (?<! \A) | \A {name_a) $not_sp_param*+ \K) \s+ \w+ = \" [^^\"]* \" ) ()xmsg;
        Your variant still depends of new line. I mean it doesn't work properly with
        {name_b param_v="wh"}{name_a param_x="abc" param_a="fsd" param_y="def" +} {name_z param_sd="zka" param_s="df"}{name_a param_y="wtf" param_z="kro +" param_ch="www"}
        But thanks anyway. When I parsed your regexp I found new for me \K and totally understood how \G works. See below my solution based on your. Seems it works properly.
        s/ (?:\{name_a | \G) [^{}]+? \K \w++(?<!param_[xy]) ="[^"]+" //xg;

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://923998]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2018-05-27 18:45 GMT
Find Nodes?
    Voting Booth?