Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: matching first paragraph satisfying condition

by mnshptl32 (Initiate)
on Aug 05, 2019 at 19:55 UTC ( #11103989=note: print w/replies, xml ) Need Help??

in reply to matching first paragraph satisfying condition

Thank you, 1nickt and jwkrahn, for your lovely solutions to the problem! I also appreciate that your one-liners interpret "indented" as any whitespace at the start of the line, not just spaces. Incidentally, I didn't mention some of my other failed attempts, such as perl -0pe 's/pattern1|pattern2/$1/gs'; apparently the "or" operator doesn't work as expected here.

Best regards,


Replies are listed 'Best First'.
Re^2: matching first paragraph satisfying condition
by haukex (Bishop) on Aug 06, 2019 at 09:12 UTC
    perl -0pe 's/pattern1|pattern2/$1/gs'; apparently the "or" operator doesn't work as expected here.

    The | alternation operator has pretty low precedence, so it kind of depends on what your expectations are :-) A common trap is to write something like /^foo|bar$/ and expect that to match only "foo" or "bar", when in fact it is matching ^foo or bar$ - the correct way to express that would have been /^(foo|bar)$/ or /^(?:foo|bar)$/.

    Based on your $1 in the replacement, I suspect you were doing something like s/f(o)o|b(a)r/$1/g and expecting the string "bar" to be turned into "a"? In that case, you need the "branch reset" pattern (?|...) (perlre): s/(?|f(o)o|b(a)r)/$1/g will replace "foo" with "o" and "bar" with "a".

      Thank you, haukex, for the information about the correct use of the branch reset, and sorry for making you guess my incorrect code. It was (and is) a bit of a moot point, since I've been given multiple perl solutions to my problem; however, for the record, my aforementioned failed attempt was to combine the ideas I had for what in my original post I described as the former and latter cases:

      perl -0pe 's/(.*?\n*?([^ \n].*?)\n\n.*|.*?\n([^ \n].*?)\n\n.*)/$1/gs' +file.txt

      This simply returned the whole file, and I thought it had something to do with the nested parentheses altering what $1 refers to. When I rewrite it correctly, as per haukex's helpful advice:

      perl -0pe 's/(?|.*?\n*?([^ \n].*?)\n\n.*|.*?\n([^ \n].*?)\n\n.*)/$1/gs +' file.txt

      it does return something, but that something is the desired output only in the case where the first line is non-indented. The reason, obviously, is that I was making a stupid mistake: the first alternative returns a match whether or not the first line of file.txt is indented, so that code is never going to return the output of:

      perl -0pe 's/.*?\n([^ \n].*?)\n\n.*/$1/gs' file.txt

      As I said, it's a moot point, but I thought this follow-up was worth mentioning in case someone who lands on this page learns from my stupid mistake.



Re^2: matching first paragraph satisfying condition
by 1nickt (Abbot) on Aug 05, 2019 at 20:17 UTC

    (non-capturing group)

    The way forward always starts with a minimal test.

      s/(?:foo|bar)/baz/ is pretty much identical to s/foo|bar/baz/, so I don't think that's the issue.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11103989]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2020-10-21 16:27 GMT
Find Nodes?
    Voting Booth?
    My favourite web site is:

    Results (219 votes). Check out past polls.