Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

comment on

( [id://3333] : superdoc . print w/replies, xml ) Need Help??
use strict; use warnings; use feature 'say'; # use Regexp::Common; # ^^^ Not used. I'm so lazy, I just peeked at $RE{quoted} # to construct the "$quoted" expression below, by slightly # modifying it (see "$") to satisfy the third clause. # And actually 2nd test case below is to test how it works, # it seems there's not a similar one among your 18. my $quoted = qr/ (?:(?| (?:(?<!\\)\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$)| (?:(?<!\\)\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$) )) /x; my $re = qr/(?:$quoted|[^ ])+\K(?: |$)/; my @tests = ( q(This 'isn\'t nice.'), q(This 'isn\'t nice.), q(This \"isnt unnice.\"), ); for my $t ( @tests ) { say "[$_]" for split $re, $t; } __END__ [This] ['isn\'t nice.'] [This] ['isn\'t nice.] [This] [\"isnt] [unnice.\"]

10 minutes update: aargh, added negative look-behind to cover your 14th case (and added my third). Maybe there are more to add. Further: it's more tricky, 6 (and 7) are split in 3, but wrong, groups. Will look into that later. False alarm? Will see yet later :)

Next morning update. As LanX pointed out, negative look-behind for just a single backslash isn't enough. Then to save this answer (I like how the "keep" \K meta-character helps in regexp for split, it's kind of interesting), maybe it's easier to revert $quoted to as it was borrowed from $RE{quoted}, and tweak the $re:

my $quoted = qr/ (?:(?| (?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$)| (?:\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$) )) /x; my $re = qr/ (?: (?:\\\\)+ | (?:\\[^ ]) | $quoted | [^ ] )+ \K (?: \ | $ ) /x;

I hope it works now, my 1st attempt at this "update" was broken (see, but better not -- nothing interesting -- below. Sorry for the mess.). But further, it's unclear whether to split on escaped space, or several spaces in a row.

my $quoted = qr/ (?:(?| (?: (?:[^\\\'\ ]*(?:\\[^\ ][^\\\'\ ]*)*) \" ) (?: [^\\\"]* (?: \\ . [^\\\"]* )* ) (?:\"|$) | (?:(?:[^\\\' ]*(?:\\[^ ][^\\\' ]*)*)\')(?:[^\\\']*(?:\\.[^\\\']*)* +)(?:\'|$) )) /x;

And later (final(?)) update: Sigh... damn lack of practice. So this:

my $quoted = qr/ (?:(?| (?:\")(?:[^\\\"]*(?:\\.[^\\\"]*)*)(?:\"|$) | (?:\')(?:[^\\\']*(?:\\.[^\\\']*)*)(?:\'|$) )) /x; my $re = qr/ (?: (?:\\.)+ | $quoted | [^ \\"']+ )* \K (?: \ | $ )+ /x; # and later: my $got = [ split $re, $str ];

passes all tests in LanX's later answer except #2 and is somewhat optimized.

About test #2: consensus is "the brief is unclear", must split-like behaviour generate an empty leading field for #2? Expression to split on is definitely not missing nor space literal. If, nevertheless, it must not (as my solution does, failing #2), then my bad, but still, yeah, this regexp is "working" and can be used to literally split on. :)

In reply to Re: solution wanted for break-on-spaces (w/specifics) by vr
in thread solution wanted for break-on-spaces (w/specifics) by perl-diddler

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.