Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

RegEx related line split

by dominic01 (Sexton)
on Nov 11, 2011 at 17:06 UTC ( [id://937621]=perlquestion: print w/replies, xml ) Need Help??

dominic01 has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to split the following line and would like to process individual parts.

my $RefLine = "(a) This is first line. (b) This is second line; (c) Th +is is different line 32. (d) Here is the last line."; $RefLine =~ /^(\([a-z]\) .+ (?=\([a-z]\)))(.*?)/; print "01\t$1\n02\t$2\n03\t$3\n";

I am not actually splitting in the above example but I am trying to make a match so that I can split individual lines. Here in my case the pattern matches "(a) .... to 32." Not sure how to get them in an array like

(a) This is first line.
(b) This is second line;
(c) This is different line 32.
(d) Here is the last line.
Appreciate any help

Replies are listed 'Best First'.
Re: RegEx related line split
by johngg (Canon) on Nov 11, 2011 at 17:59 UTC

    This might be what you want.

    knoppix@Microknoppix:~$ perl -E ' > $line = q{(a) Line 1. (b) Line 2. (c) Line 32. (d) Line 42.}; > @arr = split m{(?=\([a-z]\))}, $line; > say qq{>$_<} for @arr;' >(a) Line 1. < >(b) Line 2. < >(c) Line 32. < >(d) Line 42.< knoppix@Microknoppix:~$

    I hope this is of use.

    Cheers,

    JohnGG

      Thank you. This works as per my requirement but I was trying one other pattern
      $line = q{a) Line 1. b) Line 2. c) Line 32. d) Line 42.};
      Here it matches if some (words) within () that are coming in the line.

        Do you mean that the opening parenthesis is optional in the text you are spliting? If so, you can use a '?' quantifier to make the opening parenthesis "zero or one of" but you also have to use a negative look behind to make sure you don't split '(' from 'a)'. I've added the 'x' modifier to the pattern so I can space it out and make it more readable.

        knoppix@Microknoppix:~$ perl -E ' > $line = q{(a) Line 1. b) Line 2. (c) Line 32. d) Line 42.}; > @arr = split m{ (?<! \( ) (?= \(? [a-z] \) ) }x, $line; > say qq{>$_<} for @arr;' >(a) Line 1. < >b) Line 2. < >(c) Line 32. < >d) Line 42.< knoppix@Microknoppix:~$

        I hope this is helpful.

        Cheers,

        JohnGG

      Do you mean that the opening parenthesis is optional in the text you are spliting? If so, you can use a '?' quantifier to make the opening parenthesis "zero or one of" but you also have to use a negative look behind to make sure you don't split '(' from 'a)'. I've added the 'x' modifier to the pattern so I can space it out and make it more readable.

      knoppix@Microknoppix:~$ perl -E ' > $line = q{(a) Line 1. b) Line 2. (c) Line 32. d) Line 42.}; > @arr = split m{ (?<! \( ) (?= \(? [a-z] \) ) }x, $line; > say qq{>$_<} for @arr;' >(a) Line 1. < >b) Line 2. < >(c) Line 32. < >d) Line 42.< knoppix@Microknoppix:~$

      I hope this is helpful.

      Update: Oops, looks like I replied to myself rather than the OP's subsequent question. Please ignore this and consider my reply to him.

      Cheers,

      JohnGG

        Alas a senior moment, it happens to all old coders.

Re: RegEx related line split
by Lotus1 (Vicar) on Nov 11, 2011 at 21:58 UTC

    You only had two capturing ()'s so $3 won't return anything your way. The .+ captures everything up to (d) since it is greedy.

    To make your regex work:

  • Use the /g option.
  • Remove the ^ anchor at the start.
  • Remove the (.*?) at the end.
  • Add the $ option at the end in place of the a-z in the lookahead.
  • my $RefLine = "(a) This is first line. (b) This is second line; (c) Th +is is different line 32. (d) Here is the last line."; @lines = $RefLine =~ /(\([a-z]\).*?)(?=$|\([a-z]\))/g; print ">>$_<<\n" foreach @lines;
      This is great. Thank You.
      Would you mind if I ask my question? I don't understand "$ option" and saw perlre's Extended Patterns, but I could not figure out what is this.
      (?=$|MARK)
      "(?=" is zero width look ahead assertion and I wonder what is "$|" ? Usually I will do this with character class
      @lines = $RefLine =~ /(\([a-z]*\)[^\(]*)/g;
      This will fail if $RefLine includes another ().
      my $RefLine = "(a) This is first line(once all 4 lines were one line). + (b) This is second line; ( c) This is different line 32. (d) Here is the last line.";
      But your's works fine. More robust. I am glad with some pointer or clue for me. regards.
        Do you understand foo|bar? Do you understand $? Do you understand (?= )? Combine all three and you get (?=$|MARK).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://937621]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-24 12:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found