Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Capturing groups where the ending is optional

by cwm9 (Initiate)
on Jan 14, 2018 at 23:34 UTC ( #1207241=perlquestion: print w/replies, xml ) Need Help??
cwm9 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to split a line into two groups. The first group is a complete unknown, except that it will not contain a specific string. The second group starts with that same string and includes everything after it which may or may not be present. Example: suppose 'right' is the specific string in question. Here's an example of the desired input vs output:
left \1=left \2= right \1= \2=right rightabc \1= \2=rightabc leftright \1=left \2=right leftrightabc \1=left \2=rightabc
Here's what I've tried so far. This version is overly greedy -- it won't give up eating group 2 into group 1.
s/(.*)(right)?/\1 <> \2/ echo "left"|perl -ne 'print if s/(.*)(right)?/\1 <> \2/' + left <> echo "right"|perl -ne 'print if s/(.*)(right)?/\1 <> \2/' + right <> echo "leftright"|perl -ne 'print if s/(.*)(right)?/\1 <> \2/' + leftright <>
This version splits properly, but /2 is the same as /1 when there is no /2:
s/(.*(?=right.*)|(.*))/\ echo "left"|perl -ne 'print if s/(.*(?=right.*)|(.*))/\1 <> \2/' + left <> left echo "right"|perl -ne 'print if s/(.*(?=right.*)|(.*))/\1 <> \2/' + <> right echo "leftright"|perl -ne 'print if s/(.*(?=right.*)|(.*))/\1 <> \2/' + left <> right

Replies are listed 'Best First'.
Re: Capturing groups where the ending is optional
by tybalt89 (Priest) on Jan 15, 2018 at 00:03 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1207241 use strict; use warnings; my $specific = 'right'; while(<DATA>) { chomp; /(?|(.*?)($specific.*)|(.*)())/; printf "%-12s \\1=%4s \\2=%s\n", $_, $1, $2; } __DATA__ left right rightabc leftright leftrightabc

    Outputs:

    left \1=left \2= right \1= \2=right rightabc \1= \2=rightabc leftright \1=left \2=right leftrightabc \1=left \2=rightabc
Re: Capturing groups where the ending is optional
by ikegami (Pope) on Jan 14, 2018 at 23:42 UTC

    I think you're asking for

    / ^ ( (?: (?!\Q$key\E). )* ) ( .* ) /xs

    By the way, you shouldn't be using \1 and \2 in the substitution expression (as Perl told you through warnings you apparently ignored). \1 and \2 are regex atoms that instruct the regex engine to match first and second captured string respectively. It makes no sense to use them in the substitution expression. Use $1 and $2 instead.

      Thank you. I'll examine this code. As for the $1 vs \1, there were no warnings --- I'm only using perl from the command line in a bash script. I'm pretty much a complete perl noob. Thank you for helping.
        ... there were no warnings --- I'm only using perl from the command line .... I'm pretty much a complete perl noob.

        Perl noobs in particular should always enable warnings and strictures. One way to do this from the command line is
            perl -wMstrict -e "perl code here ..." whatever else ...
        (double-quotes for Windoze, single-quotes otherwise).


        Give a man a fish:  <%-{-{-{-<

Re: Capturing groups where the ending is optional
by AnomalousMonk (Chancellor) on Jan 15, 2018 at 00:30 UTC

    Here, in the Test::More format that may, in future, avoid a great deal of confusion, is a possible solution based on ikegami's solution. Note that the  => (fat comma) in an expression like
        [ 'leftrightabc' => 'left', 'rightabc' ],
    is just a notational indulgence that is intended to be read as "produces" or "yields" and that is syntactically just a plain old  , (comma) (update: because its LHS is fully quoted).

    c:\@Work\Perl\monks>perl -wMstrict -le "use Test::More 'no_plan'; use Test::NoWarnings; ;; use constant TEST_1 => ( [ 'left' => 'left', '' ], [ 'right' => '', 'right' ], [ 'rightabc' => '', 'rightabc' ], [ 'leftright' => 'left', 'right' ], [ 'leftrightabc' => 'left', 'rightabc' ], ); ;; my $specific_string = qr{ right }xms; ;; my $right_side = qr{ $specific_string .* \z }xms; my $left_side = qr{ (?! $specific_string) . }xms; ;; VECTOR: for my $ar_vector (TEST_1) { my ($string, @expected) = @$ar_vector; ;; is_deeply [ $string =~ m{ \A ($left_side*) ($right_side?) }xms ], \@expected, qq{'$string' -> '$expected[0]' '$expected[1]'}; } ;; done_testing; " ok 1 - 'left' -> 'left' '' ok 2 - 'right' -> '' 'right' ok 3 - 'rightabc' -> '' 'rightabc' ok 4 - 'leftright' -> 'left' 'right' ok 5 - 'leftrightabc' -> 'left' 'rightabc' 1..5 ok 6 - no warnings 1..6

    Update: Note also that  $specific_string could be any  qr// pattern.


    Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1207241]
Approved by Athanasius
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2018-09-20 03:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Eventually, "covfefe" will come to mean:













    Results (171 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!