Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Re^2: matching first paragraph satisfying condition

by haukex (Chancellor)
on Aug 06, 2019 at 09:12 UTC ( #11104020=note: print w/replies, xml ) Need Help??

in reply to Re: matching first paragraph satisfying condition
in thread matching first paragraph satisfying condition

perl -0pe 's/pattern1|pattern2/$1/gs'; apparently the "or" operator doesn't work as expected here.

The | alternation operator has pretty low precedence, so it kind of depends on what your expectations are :-) A common trap is to write something like /^foo|bar$/ and expect that to match only "foo" or "bar", when in fact it is matching ^foo or bar$ - the correct way to express that would have been /^(foo|bar)$/ or /^(?:foo|bar)$/.

Based on your $1 in the replacement, I suspect you were doing something like s/f(o)o|b(a)r/$1/g and expecting the string "bar" to be turned into "a"? In that case, you need the "branch reset" pattern (?|...) (perlre): s/(?|f(o)o|b(a)r)/$1/g will replace "foo" with "o" and "bar" with "a".

Replies are listed 'Best First'.
Re^3: matching first paragraph satisfying condition
by mnshptl32 (Initiate) on Aug 20, 2019 at 18:07 UTC

    Thank you, haukex, for the information about the correct use of the branch reset, and sorry for making you guess my incorrect code. It was (and is) a bit of a moot point, since I've been given multiple perl solutions to my problem; however, for the record, my aforementioned failed attempt was to combine the ideas I had for what in my original post I described as the former and latter cases:

    perl -0pe 's/(.*?\n*?([^ \n].*?)\n\n.*|.*?\n([^ \n].*?)\n\n.*)/$1/gs' +file.txt

    This simply returned the whole file, and I thought it had something to do with the nested parentheses altering what $1 refers to. When I rewrite it correctly, as per haukex's helpful advice:

    perl -0pe 's/(?|.*?\n*?([^ \n].*?)\n\n.*|.*?\n([^ \n].*?)\n\n.*)/$1/gs +' file.txt

    it does return something, but that something is the desired output only in the case where the first line is non-indented. The reason, obviously, is that I was making a stupid mistake: the first alternative returns a match whether or not the first line of file.txt is indented, so that code is never going to return the output of:

    perl -0pe 's/.*?\n([^ \n].*?)\n\n.*/$1/gs' file.txt

    As I said, it's a moot point, but I thought this follow-up was worth mentioning in case someone who lands on this page learns from my stupid mistake.



Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11104020]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2020-01-23 13:14 GMT
Find Nodes?
    Voting Booth?