http://www.perlmonks.org?node_id=1028380

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

1) some text 2) 1. some text 2. 1) some text 2. 1. 2.10.10.20 some text 2. 1) 2.10.10.20 2)
i want to get the text between 1) and 2) or 1. or 2. , here is what i have written  1[)|.](.*?)2[)|.] so here in perl $1 will give me the text between , but this will fail in the third match (  1. 2.10.10.20 some text 2.) so can some one please help me in setting up the regexp.

Replies are listed 'Best First'.
Re: regular expression for getting text between 1. and 2.
by Kenosis (Priest) on Apr 12, 2013 at 17:04 UTC

    In the cases you've shown, there's whitespace enclosing the text you want, so you can use a lookbehind and lookahead to capaure the enclosed text:

    use warnings; use strict; while (<DATA>) { print "$1\n" if /(?<=\s)([a-z\s]+)(?=\s)/i; } __DATA__ 1) some text 2) 1. some text 2. 1) some text 2. 1. 2.10.10.20 some text 2. 1) 2.10.10.20 2)

    Output:

    some text some text some text some text

    Hope this helps!

Re: regular expression for getting text between 1. and 2.
by si_lence (Deacon) on Apr 12, 2013 at 14:33 UTC
    I see two options: Either use anchors /^1[).](.*?)2[).]$/
    or use a greedy match /1[).](.*)2[).]/

    btw. In the character class you don't need the pipe for an or. Your regex would also match 1| some text 2|

    hope this helps
Re: regular expression for getting text between 1. and 2.
by hdb (Monsignor) on Apr 12, 2013 at 14:33 UTC

    Just leave the ? out.

    1[)|.](.*)2[)|.]
      Funny. The user asked the same question on SO and I replied with this same answer but he then came up with more input strings that complicate the regex.

        This is a problem with many regex questions here and elsewhere. Just a handful of examples usually do not really explain what the pattern is. Sometimes I find it challenging to "guess" what the real question is (sometimes this is impossible) and sometimes I just answer what was asked for. The hope is that the OP learns something from either answer. At least I often do...

Re: regular expression for getting text between 1. and 2.
by Loops (Curate) on Apr 12, 2013 at 14:37 UTC

    Hi

    Assuming you want the period or parenthesis character to be the same on each end of the match, the following code works:

    use warnings; use strict; use feature 'say'; while (<DATA>) { chomp; if (/1([)|\.])(.*)(2\1)/) { say $2; } else { say "BAD LINE:", $_; } } __DATA__ 1) some text 2) 1. some text 2. 1) some text 2. 1. 2.10.10.20 some text 2. 1) 2.10.10.20 2)
Re: regular expression for getting text between 1. and 2.
by hippo (Bishop) on Apr 12, 2013 at 14:34 UTC
    $line =~ /^1[.)](.*)2[.)]$/;

    Is that what you meant?

Re: regular expression for getting text between 1. and 2.
by prashantktyagi (Scribe) on Apr 12, 2013 at 14:41 UTC
    check this
    while(<DATA>) { chomp; if (/^1[.)](.*?)2[.)]$/) { print"matched $1\n"; } else{ print "not matched \n"; } } __DATA__ 1. 2.10.10.20 some text 2. 1) 2.10.10.20 2)
Re: regular expression for getting text between 1. and 2.
by reisinge (Hermit) on Apr 12, 2013 at 15:53 UTC

    I recommend Regexp::Debugger, it's a great tool! Here's video on its usage, but it's actually this simple:

    rxrx prog_with_regexp.pl

    Well done is better than well said. -- Benjamin Franklin