Beefy Boxes and Bandwidth Generously Provided by pair Networks Russ
XP is just a number
 
PerlMonks  

RegExp: pos management in global substitution

by OlegG (Monk)
on Sep 03, 2011 at 11:02 UTC ( #923977=perlquestion: print w/ replies, xml ) Need Help??
OlegG has asked for the wisdom of the Perl Monks concerning the following question:

Hi all
I have such text:
{name_b param_v="wh"} {name_a param_x="abc" param_a="fsd" param_y="def"} {name_z param_sd="zka" param_s="df"} {name_a param_y="wtf" param_z="kro" param_c="ptz" param_ch="www"}
And want to get this:
{name_b param_v="wh"} {name_a param_x="abc" param_y="def"} {name_z param_sd="zka" param_s="df"} {name_a param_y="wtf"}
In other words I want to remove all parameters with names different from "param_x" and "param_y" in all tags with name "name_a".
Can this be done with s/// operator without e modifier in one approach?
That's what I think about it:
s/(\{name_a[^}]+?)\w+(?<!param_x|param_y)="[^"]+"/$1/g;
In this way we will remove only first undesirable parameter.
But if we could say to regexp engine:
1. After success substitution return to the position where substitution started and try to substitute again
2. If ok, go to step 1. If nothing matched in this position move along the string as usual and when matched and substituted go to step 1.

In this way work will be done correctly.
Please tell me, is it possible with Perl regexp engine? Or can yo provide me other solution?

Comment on RegExp: pos management in global substitution
Select or Download Code
Re: RegExp: pos management in global substitution
by Perlbotics (Abbot) on Sep 03, 2011 at 11:52 UTC

    I would use two regexp: first to check if line starts with {name_a and a second to remove all unwanted param_* entries:

    m/wanted name_a-line/  and  s/unwanted parameters//g;

      I know. This is a purely scientific interest to do it with s/// only.
Re: RegExp: pos management in global substitution
by Anonymous Monk on Sep 03, 2011 at 12:59 UTC

    Can this be done with s/// operator without e modifier in one approach?

    Probably, but its a brainfuckadvanced way to go about it :)

    1. After success substitution return to the position where substitution started and try to substitute again...

    Not sure , but I don't think you can without /e

    Here is my non working advanced attempt

    s{ (?<= \{name_a ) (?: (?: (?<= \s ) | (?<= param_[xy]=" ) (?: (?<= [^"] ) )+ (?<= " ) ) | param_[^xy]\w*="[^"]+" )+ (?<= \} ) }< eaten >gx;
    I'm probably not understanding (?<=pattern) correctly

    The following works

    s/ \{name_a ( (?: [^=\s]+ = "[^"]*" | \s+ )+ ) \}/ FunAt($1) /gex; sub FunAt { join '', '{name_a ', AtAt(@_) ,'}'; } sub AtAt { my %fun = $_[0] =~ m/([^=\s]+) = ("[^"]*")/gx; return join ' ', map { "$_=$fun{$_}" } grep /param_[xy]/, keys %fu +n; }
      This solution is not interesting.

        This solution is not interesting.

        I wouldn't have bothered otherwise if you had been clearer when you asked for other solutions

Re: RegExp: pos management in global substitution
by AnomalousMonk (Monsignor) on Sep 03, 2011 at 15:03 UTC

    I agree that a two-step approach (i.e., match for name_a, then delete unwanted params) is best. However, here's an 'advanced' (read: unnecessarily complex) way that doesn't use the  /e modifier:

    >perl -wMstrict -le "my @txt = ( '{name_b param_v=\"wh\"}', '{name_a param_x=\"abc\" param_a=\"fsd\" param_y=\"def\"}', '{name_z param_sd=\"zka\" param_s=\"df\"}', '{name_a param_y=\"wtf\" param_z=\"kro\" param_c=\"ptz\" param_ch=\ +"www\"}', '{name_a param_sd=\"zka\" param_y=\"wtf\"}', ); ;; my $not_p_xy = qr{ (?! param_ [xy] \b) }xms; my $not_sp_param = qr{ (?! \s+ $not_p_xy param_ \w+) . }xms; ;; for my $s (@txt) { print qq{'$s'}; $s =~ s( (?: \G (?<! \A) | \A {name_a) $not_sp_param* \K \s+ $not_p_xy \w+ = \" [^^\"]* \" ) ()xmsg; print qq{'$s' \n}; } " '{name_b param_v="wh"}' '{name_b param_v="wh"}' '{name_a param_x="abc" param_a="fsd" param_y="def"}' '{name_a param_x="abc" param_y="def"}' '{name_z param_sd="zka" param_s="df"}' '{name_z param_sd="zka" param_s="df"}' '{name_a param_y="wtf" param_z="kro" param_c="ptz" param_ch="www"}' '{name_a param_y="wtf"}' '{name_a param_sd="zka" param_y="wtf"}' '{name_a param_y="wtf"}'

    Updates:

    1. Removed extraneous  (?: ) grouping around  \K expression above. Example output unchanged.
    2. I agree with Re^2: RegExp: pos management in global substitution regarding efficiency, and have further simplified (IMO) to:
      sub xform { my ($string, ) = @_; my $xy = qr{ [xy] }xmso; my $val = qr{ = ' [^']* ' }xmso; my $param_ = qr{ \s+ param_ }xmso; my $param_xy = qr{ $param_ $xy $val }xmso; my $param_any = qr{ $param_ \w+ $val }xmso; $string =~ s{ (?: \G (?<! ^) | ^ \{ name_a) $param_xy*+ \K $param_any } ''xmsg; return $string; }
      Update: Note that I'm using ' (single-quote) instead of " (double-quote) as the parameter value delimiter in my code and testing. This uses 5.10+ regex features. This has withstood everything I have thrown at it (including the multi-line example of the OP!), and I think it's my final answer. (This even qualifies as perhaps not unnecessarily complex!)

      Interesting, but it depends of new line. In my example each tag separated by new line, but it is special case.
      And it seems your regexp could be more efficient:
      s( (?: (?: \G (?<! \A) | \A {name_a) $not_sp_param*+ \K) \s+ \w+ = \" [^^\"]* \" ) ()xmsg;
Re: RegExp: pos management in global substitution
by johngg (Abbot) on Sep 03, 2011 at 22:09 UTC

    This thread may be pertinent to your problem.

    Cheers,

    JohnGG

      Yes, it was something similar. \G really rocks

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://923977]
Approved by koolgirl
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (10)
As of 2014-04-23 20:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (554 votes), past polls