http://www.perlmonks.org?node_id=11137929


in reply to solution wanted for break-on-spaces (w/specifics)

use strict; use warnings; use feature 'say'; # use Regexp::Common; # ^^^ Not used. I'm so lazy, I just peeked at $RE{quoted} # to construct the "$quoted" expression below, by slightly # modifying it (see "$") to satisfy the third clause. # And actually 2nd test case below is to test how it works, # it seems there's not a similar one among your 18. my $quoted = qr/ (?:(?| (?:(?<!\\)\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$)| (?:(?<!\\)\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$) )) /x; my $re = qr/(?:$quoted|[^ ])+\K(?: |$)/; my @tests = ( q(This 'isn\'t nice.'), q(This 'isn\'t nice.), q(This \"isnt unnice.\"), ); for my $t ( @tests ) { say "[$_]" for split $re, $t; } __END__ [This] ['isn\'t nice.'] [This] ['isn\'t nice.] [This] [\"isnt] [unnice.\"]

10 minutes update: aargh, added negative look-behind to cover your 14th case (and added my third). Maybe there are more to add. Further: it's more tricky, 6 (and 7) are split in 3, but wrong, groups. Will look into that later. False alarm? Will see yet later :)

Next morning update. As LanX pointed out, negative look-behind for just a single backslash isn't enough. Then to save this answer (I like how the "keep" \K meta-character helps in regexp for split, it's kind of interesting), maybe it's easier to revert $quoted to as it was borrowed from $RE{quoted}, and tweak the $re:

my $quoted = qr/ (?:(?| (?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$)| (?:\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$) )) /x; my $re = qr/ (?: (?:\\\\)+ | (?:\\[^ ]) | $quoted | [^ ] )+ \K (?: \ | $ ) /x;

I hope it works now, my 1st attempt at this "update" was broken (see, but better not -- nothing interesting -- below. Sorry for the mess.). But further, it's unclear whether to split on escaped space, or several spaces in a row.

my $quoted = qr/ (?:(?| (?: (?:[^\\\'\ ]*(?:\\[^\ ][^\\\'\ ]*)*) \" ) (?: [^\\\"]* (?: \\ . [^\\\"]* )* ) (?:\"|$) | (?:(?:[^\\\' ]*(?:\\[^ ][^\\\' ]*)*)\')(?:[^\\\']*(?:\\.[^\\\']*)* +)(?:\'|$) )) /x;

And later (final(?)) update: Sigh... damn lack of practice. So this:

my $quoted = qr/ (?:(?| (?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$) | (?:\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$) )) /x; my $re = qr/ (?: (?:\\.)+ | $quoted | [^ \\"']+ )* \K (?: \ | $ )+ /x; # and later: my $got = [ split $re, $str ];

passes all tests in LanX's later answer except #2 and is somewhat optimized.

About test #2: consensus is "the brief is unclear", must split-like behaviour generate an empty leading field for #2? Expression to split on is definitely not missing nor space literal. If, nevertheless, it must not (as my solution does, failing #2), then my bad, but still, yeah, this regexp is "working" and can be used to literally split on. :)

Replies are listed 'Best First'.
Re^2: solution wanted for break-on-spaces (w/specifics)
by LanX (Saint) on Oct 24, 2021 at 00:30 UTC
    I'm not sure about this

    (?:(?<!\\)\")

    I read it as doublequote which is not preceded by backslash

    But what about an escaped backslash \\" or two \\\\" ... ?

    I'd rather try something like (Untested pseudocode)

    s/^(?:$escaped|$quoted|\S)*\K\s+/\n/g

    and

    $escaped = qr/\\./; $quoted = qr/ (['"]) # start (?: $escaped | [^\1] )* # inside \1 # end, probably \g-1 better /x;

    NB: I didn't cover the case of unclosed quotes, which is unclear anyway.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    update

    tested - fails - good night! :)

    update

    see Re: solution wanted for break-on-spaces (w/specifics) for "working" example