Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: RegEx related line split

by Lotus1 (Chaplain)
on Nov 11, 2011 at 21:58 UTC ( #937675=note: print w/ replies, xml ) Need Help??


in reply to RegEx related line split

You only had two capturing ()'s so $3 won't return anything your way. The .+ captures everything up to (d) since it is greedy.

To make your regex work:

  • Use the /g option.
  • Remove the ^ anchor at the start.
  • Remove the (.*?) at the end.
  • Add the $ option at the end in place of the a-z in the lookahead.
  • my $RefLine = "(a) This is first line. (b) This is second line; (c) Th +is is different line 32. (d) Here is the last line."; @lines = $RefLine =~ /(\([a-z]\).*?)(?=$|\([a-z]\))/g; print ">>$_<<\n" foreach @lines;


    Comment on Re: RegEx related line split
    Select or Download Code
    Replies are listed 'Best First'.
    Re^2: RegEx related line split
    by dominic01 (Acolyte) on Nov 14, 2011 at 03:17 UTC
      This is great. Thank You.
    Re^2: RegEx related line split
    by remiah (Hermit) on Nov 14, 2011 at 08:54 UTC
      Would you mind if I ask my question? I don't understand "$ option" and saw perlre's Extended Patterns, but I could not figure out what is this.
      (?=$|MARK)
      "(?=" is zero width look ahead assertion and I wonder what is "$|" ? Usually I will do this with character class
      @lines = $RefLine =~ /(\([a-z]*\)[^\(]*)/g;
      This will fail if $RefLine includes another ().
      my $RefLine = "(a) This is first line(once all 4 lines were one line). + (b) This is second line; ( c) This is different line 32. (d) Here is the last line.";
      But your's works fine. More robust. I am glad with some pointer or clue for me. regards.
        Do you understand foo|bar? Do you understand $? Do you understand (?= )? Combine all three and you get (?=$|MARK).
          Try to explain myself.

          foo|bar is foo or bar. if it is grouped by (foo|bar), the matched $1 will be set to "foo" or "bar".

          In this case ... it is not "non capturing grouping" (?foo|bar), because it is zero width look ahead assertion '(?='. Zero width look ahead assertion works like place holder and it does not eat up pos($expr) in matching.

          $ is the end of line... as far as I know.

          Well, it says look ahead for "end of line" or MARK and match against them as 'place holder'. I think I understand this!

          #!/usr/bin/perl use strict; use warnings; my $RefLine = "(a) This is first line(once all 4 was one line). (b) Th +is is second line; ( print "original -----\n"; print "$RefLine\n"; print "original -----\n\n"; print "\n## without 'end of line or' condtion. last line fails\n"; while( $RefLine =~ /(\([a-z]\).*?)(?=\([a-z]\))/g ){ my $p=pos $RefLine; print "$-[0], $p,matched=$&\n"; print "---\n"; } print "\n## without lookahead assertion... \n"; while( $RefLine =~ /(\([a-z]\).*?)($|\([a-z]\))/g ){ my $p=pos $RefLine; print "$-[0], $p,matched=$&\n"; print "---\n"; } print "\n## with 'end of line or' condtion and zero width place holder +\n"; while( $RefLine =~ /(\([a-z]\).*?)(?=$|\([a-z]\))/g ){ my $p=pos $RefLine; print "$-[0], $p,matched=$&\n"; print "---\n"; }

          Thank you very much JavaFan.

    Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Node Status?
    node history
    Node Type: note [id://937675]
    help
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (7)
    As of 2015-07-28 05:05 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (252 votes), past polls