http://www.perlmonks.org?node_id=11103989


in reply to matching first paragraph satisfying condition

Thank you, 1nickt and jwkrahn, for your lovely solutions to the problem! I also appreciate that your one-liners interpret "indented" as any whitespace at the start of the line, not just spaces. Incidentally, I didn't mention some of my other failed attempts, such as perl -0pe 's/pattern1|pattern2/$1/gs'; apparently the "or" operator doesn't work as expected here.

Best regards,

Maneesh

Replies are listed 'Best First'.
Re^2: matching first paragraph satisfying condition
by haukex (Chancellor) on Aug 06, 2019 at 09:12 UTC
    perl -0pe 's/pattern1|pattern2/$1/gs'; apparently the "or" operator doesn't work as expected here.

    The | alternation operator has pretty low precedence, so it kind of depends on what your expectations are :-) A common trap is to write something like /^foo|bar$/ and expect that to match only "foo" or "bar", when in fact it is matching ^foo or bar$ - the correct way to express that would have been /^(foo|bar)$/ or /^(?:foo|bar)$/.

    Based on your $1 in the replacement, I suspect you were doing something like s/f(o)o|b(a)r/$1/g and expecting the string "bar" to be turned into "a"? In that case, you need the "branch reset" pattern (?|...) (perlre): s/(?|f(o)o|b(a)r)/$1/g will replace "foo" with "o" and "bar" with "a".

      Thank you, haukex, for the information about the correct use of the branch reset, and sorry for making you guess my incorrect code. It was (and is) a bit of a moot point, since I've been given multiple perl solutions to my problem; however, for the record, my aforementioned failed attempt was to combine the ideas I had for what in my original post I described as the former and latter cases:

      perl -0pe 's/(.*?\n*?([^ \n].*?)\n\n.*|.*?\n([^ \n].*?)\n\n.*)/$1/gs' +file.txt

      This simply returned the whole file, and I thought it had something to do with the nested parentheses altering what $1 refers to. When I rewrite it correctly, as per haukex's helpful advice:

      perl -0pe 's/(?|.*?\n*?([^ \n].*?)\n\n.*|.*?\n([^ \n].*?)\n\n.*)/$1/gs +' file.txt

      it does return something, but that something is the desired output only in the case where the first line is non-indented. The reason, obviously, is that I was making a stupid mistake: the first alternative returns a match whether or not the first line of file.txt is indented, so that code is never going to return the output of:

      perl -0pe 's/.*?\n([^ \n].*?)\n\n.*/$1/gs' file.txt

      As I said, it's a moot point, but I thought this follow-up was worth mentioning in case someone who lands on this page learns from my stupid mistake.

      Regards,

      Maneesh

Re^2: matching first paragraph satisfying condition
by 1nickt (Abbot) on Aug 05, 2019 at 20:17 UTC

    s/(?:foo|bar)/baz/;
    (non-capturing group)


    The way forward always starts with a minimal test.

      s/(?:foo|bar)/baz/ is pretty much identical to s/foo|bar/baz/, so I don't think that's the issue.