Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: RegEx related line split

by Lotus1 (Curate)
on Nov 11, 2011 at 21:58 UTC ( #937675=note: print w/replies, xml ) Need Help??

in reply to RegEx related line split

You only had two capturing ()'s so $3 won't return anything your way. The .+ captures everything up to (d) since it is greedy.

To make your regex work:

  • Use the /g option.
  • Remove the ^ anchor at the start.
  • Remove the (.*?) at the end.
  • Add the $ option at the end in place of the a-z in the lookahead.
  • my $RefLine = "(a) This is first line. (b) This is second line; (c) Th +is is different line 32. (d) Here is the last line."; @lines = $RefLine =~ /(\([a-z]\).*?)(?=$|\([a-z]\))/g; print ">>$_<<\n" foreach @lines;

    Replies are listed 'Best First'.
    Re^2: RegEx related line split
    by dominic01 (Acolyte) on Nov 14, 2011 at 03:17 UTC
      This is great. Thank You.
    Re^2: RegEx related line split
    by remiah (Hermit) on Nov 14, 2011 at 08:54 UTC
      Would you mind if I ask my question? I don't understand "$ option" and saw perlre's Extended Patterns, but I could not figure out what is this.
      "(?=" is zero width look ahead assertion and I wonder what is "$|" ? Usually I will do this with character class
      @lines = $RefLine =~ /(\([a-z]*\)[^\(]*)/g;
      This will fail if $RefLine includes another ().
      my $RefLine = "(a) This is first line(once all 4 lines were one line). + (b) This is second line; ( c) This is different line 32. (d) Here is the last line.";
      But your's works fine. More robust. I am glad with some pointer or clue for me. regards.
        Do you understand foo|bar? Do you understand $? Do you understand (?= )? Combine all three and you get (?=$|MARK).
          Try to explain myself.

          foo|bar is foo or bar. if it is grouped by (foo|bar), the matched $1 will be set to "foo" or "bar".

          In this case ... it is not "non capturing grouping" (?foo|bar), because it is zero width look ahead assertion '(?='. Zero width look ahead assertion works like place holder and it does not eat up pos($expr) in matching.

          $ is the end of line... as far as I know.

          Well, it says look ahead for "end of line" or MARK and match against them as 'place holder'. I think I understand this!

          #!/usr/bin/perl use strict; use warnings; my $RefLine = "(a) This is first line(once all 4 was one line). (b) Th +is is second line; ( print "original -----\n"; print "$RefLine\n"; print "original -----\n\n"; print "\n## without 'end of line or' condtion. last line fails\n"; while( $RefLine =~ /(\([a-z]\).*?)(?=\([a-z]\))/g ){ my $p=pos $RefLine; print "$-[0], $p,matched=$&\n"; print "---\n"; } print "\n## without lookahead assertion... \n"; while( $RefLine =~ /(\([a-z]\).*?)($|\([a-z]\))/g ){ my $p=pos $RefLine; print "$-[0], $p,matched=$&\n"; print "---\n"; } print "\n## with 'end of line or' condtion and zero width place holder +\n"; while( $RefLine =~ /(\([a-z]\).*?)(?=$|\([a-z]\))/g ){ my $p=pos $RefLine; print "$-[0], $p,matched=$&\n"; print "---\n"; }

          Thank you very much JavaFan.

    Log In?

    What's my password?
    Create A New User
    Node Status?
    node history
    Node Type: note [id://937675]
    [ambrus]: MLX: if it's a work email, then it's probably not Uncle Sam that matters, but what the account managing server at work thinks your name is. Those can differ. For example, we've had two co-workers with identical real name at one point,
    [ambrus]: so one got a stupid suffix in the email account (people have email address based on their real name here usually).

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (14)
    As of 2017-01-19 14:05 GMT
    Find Nodes?
      Voting Booth?
      Do you watch meteor showers?

      Results (170 votes). Check out past polls.