http://www.perlmonks.org?node_id=937675


in reply to RegEx related line split

You only had two capturing ()'s so $3 won't return anything your way. The .+ captures everything up to (d) since it is greedy.

To make your regex work:

  • Use the /g option.
  • Remove the ^ anchor at the start.
  • Remove the (.*?) at the end.
  • Add the $ option at the end in place of the a-z in the lookahead.
  • my $RefLine = "(a) This is first line. (b) This is second line; (c) Th +is is different line 32. (d) Here is the last line."; @lines = $RefLine =~ /(\([a-z]\).*?)(?=$|\([a-z]\))/g; print ">>$_<<\n" foreach @lines;

    Replies are listed 'Best First'.
    Re^2: RegEx related line split
    by dominic01 (Sexton) on Nov 14, 2011 at 03:17 UTC
      This is great. Thank You.
    Re^2: RegEx related line split
    by remiah (Hermit) on Nov 14, 2011 at 08:54 UTC
      Would you mind if I ask my question? I don't understand "$ option" and saw perlre's Extended Patterns, but I could not figure out what is this.
      (?=$|MARK)
      "(?=" is zero width look ahead assertion and I wonder what is "$|" ? Usually I will do this with character class
      @lines = $RefLine =~ /(\([a-z]*\)[^\(]*)/g;
      This will fail if $RefLine includes another ().
      my $RefLine = "(a) This is first line(once all 4 lines were one line). + (b) This is second line; ( c) This is different line 32. (d) Here is the last line.";
      But your's works fine. More robust. I am glad with some pointer or clue for me. regards.
        Do you understand foo|bar? Do you understand $? Do you understand (?= )? Combine all three and you get (?=$|MARK).
          Try to explain myself.

          foo|bar is foo or bar. if it is grouped by (foo|bar), the matched $1 will be set to "foo" or "bar".

          In this case ... it is not "non capturing grouping" (?foo|bar), because it is zero width look ahead assertion '(?='. Zero width look ahead assertion works like place holder and it does not eat up pos($expr) in matching.

          $ is the end of line... as far as I know.

          Well, it says look ahead for "end of line" or MARK and match against them as 'place holder'. I think I understand this!

          #!/usr/bin/perl use strict; use warnings; my $RefLine = "(a) This is first line(once all 4 was one line). (b) Th +is is second line; ( print "original -----\n"; print "$RefLine\n"; print "original -----\n\n"; print "\n## without 'end of line or' condtion. last line fails\n"; while( $RefLine =~ /(\([a-z]\).*?)(?=\([a-z]\))/g ){ my $p=pos $RefLine; print "$-[0], $p,matched=$&\n"; print "---\n"; } print "\n## without lookahead assertion... \n"; while( $RefLine =~ /(\([a-z]\).*?)($|\([a-z]\))/g ){ my $p=pos $RefLine; print "$-[0], $p,matched=$&\n"; print "---\n"; } print "\n## with 'end of line or' condtion and zero width place holder +\n"; while( $RefLine =~ /(\([a-z]\).*?)(?=$|\([a-z]\))/g ){ my $p=pos $RefLine; print "$-[0], $p,matched=$&\n"; print "---\n"; }

          Thank you very much JavaFan.