|
in reply to solution wanted for break-on-spaces (w/specifics)
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11137926
use warnings;
use Data::Dump 'dd';
my @tests = (
# q{all '- and "-quotes properly balanced},
[ q{This is simple.}, [ q{This}, q{is}, q{simpl
+e.} ] ],
[ q{ This is simple. }, [ q{This}, q{is}, q{simpl
+e.} ] ],
[ q{This is "so very simple".}, [ q{This}, q{is}, q{"so v
+ery simple".} ] ],
[ q{This "is so" very simple.}, [ q{This}, q{"is so"}, q{
+very}, q{simple.} ] ],
[ q{This 'isn\'t nice.'}, [ q{This}, q{'isn\'t nice
+.'} ] ],
[ q{This "isn\"t nice."}, [ q{This}, q{"isn\"t nice
+."} ] ],
[ q{This 'isn\\\\'t nice.'}, [ q{This}, q{'isn\\\\'t},
+ q{nice.'} ] ],
[ q{This "isn\\\\"t nice."}, [ q{This}, q{"isn\\\\"t},
+ q{nice."} ] ],
[ q{This 'is not unnice.'}, [ q{This}, q{'is not unni
+ce.'} ] ],
[ q{This "is not unnice."}, [ q{This}, q{"is not unni
+ce."} ] ],
[ q{a "bb cc" d}, [ q{a}, q{"bb cc"}, q{d}
+ ] ],
# q{UNbalanced '- and "-quotes at absolute end of string
+},
[ q{This is "so very simple}, [ q{This}, q{is}, q{"so ver
+y simple} ] ],
[ q{This 'isn\'t nice.}, [ q{This}, q{'isn\'t nice.}
+ ] ],
[ q{This "isn\"t nice.}, [ q{This}, q{"isn\"t nice.}
+ ] ],
[ q{This 'isn\\\\'t nice.}, [ q{This}, q{'isn\\\\'t}, q
+{nice.} ] ],
[ q{This "isn\\\\"t nice.}, [ q{This}, q{"isn\\\\"t}, q
+{nice.} ] ],
[ q{This 'is not unnice.}, [ q{This}, q{'is not unnice
+.} ] ],
[ q{This "is not unnice.}, [ q{This}, q{"is not unnice
+.} ] ],
# 'what about these questionable cases?',
[ q{is this"really so"simple now?}, [ q{is}, q{this"reall
+y so"simple}, q{now?} ] ],
[ q{is this"really so" now?}, [ q{is}, q{this"reall
+y so"}, q{now?} ] ],
[ q{is "really so"simple now?}, [ q{is}, q{"really so
+"simple}, q{now?} ] ],
[ q{is this'really so'simple now?}, [ q{is}, q{this'reall
+y so'simple}, q{now?} ] ],
[ q{is this'really so' now?}, [ q{is}, q{this'reall
+y so'}, q{now?} ] ],
[ q{is 'really so'simple now?}, [ q{is}, q{'really so
+'simple}, q{now?} ] ],
[ q{is really\\ so\\ simple now?}, [ q{is}, q{really\\ so
+\\ simple}, q{now?} ] ],
);
my $regex = qr/(?:
'(?: \\. | [^'\\] )*' # single quoted string
|
"(?: \\. | [^"\\] )*" # double quoted string
|
['"].* # unmatched quote
|
\\. # escaped character
|
\S # single non-space character
)+/x;
my $passcount = 0;
for ( @tests )
{
my ( $string, $want ) = @$_;
my @out = $string =~ /$regex/g;
local $" = "\0"x5; # just some array element boundary separator
"@$want" eq "@out" ? $passcount++ :
dd "$string => FAILED got", \@out, ' wanted ', $want;
}
print "$passcount of @{[scalar @tests]} passed\n";
Outputs:
25 of 25 passed
Re^2: solution wanted for break-on-spaces (w/specifics) (?>...)
by LanX (Saint) on Oct 24, 2021 at 23:03 UTC
|
Hint: You don't need to worry about backtracking with (?>...) instead of (?:...)
This will not only make your code simpler but also faster.
#!/usr/bin/perl
use strict; # https://perlmonks.org/?node_id=11137926
use warnings;
use Data::Dump 'dd';
my @tests = (
# q{all '- and "-quotes properly balanced},
[ q{This is simple.}, [ q{This}, q{is}, q{simpl
+e.} ] ],
[ q{ This is simple. }, [ q{This}, q{is}, q{simpl
+e.} ] ],
[ q{This is "so very simple".}, [ q{This}, q{is}, q{"so v
+ery simple".} ] ],
[ q{This "is so" very simple.}, [ q{This}, q{"is so"}, q{
+very}, q{simple.} ] ],
[ q{This 'isn\'t nice.'}, [ q{This}, q{'isn\'t nice
+.'} ] ],
[ q{This "isn\"t nice."}, [ q{This}, q{"isn\"t nice
+."} ] ],
[ q{This 'isn\\\\'t nice.'}, [ q{This}, q{'isn\\\\'t},
+ q{nice.'} ] ],
[ q{This "isn\\\\"t nice."}, [ q{This}, q{"isn\\\\"t},
+ q{nice."} ] ],
[ q{This 'is not unnice.'}, [ q{This}, q{'is not unni
+ce.'} ] ],
[ q{This "is not unnice."}, [ q{This}, q{"is not unni
+ce."} ] ],
[ q{a "bb cc" d}, [ q{a}, q{"bb cc"}, q{d}
+ ] ],
# q{UNbalanced '- and "-quotes at absolute end of string
+},
[ q{This is "so very simple}, [ q{This}, q{is}, q{"so ver
+y simple} ] ],
[ q{This 'isn\'t nice.}, [ q{This}, q{'isn\'t nice.}
+ ] ],
[ q{This "isn\"t nice.}, [ q{This}, q{"isn\"t nice.}
+ ] ],
[ q{This 'isn\\\\'t nice.}, [ q{This}, q{'isn\\\\'t}, q
+{nice.} ] ],
[ q{This "isn\\\\"t nice.}, [ q{This}, q{"isn\\\\"t}, q
+{nice.} ] ],
[ q{This 'is not unnice.}, [ q{This}, q{'is not unnice
+.} ] ],
[ q{This "is not unnice.}, [ q{This}, q{"is not unnice
+.} ] ],
# 'what about these questionable cases?',
[ q{is this"really so"simple now?}, [ q{is}, q{this"reall
+y so"simple}, q{now?} ] ],
[ q{is this"really so" now?}, [ q{is}, q{this"reall
+y so"}, q{now?} ] ],
[ q{is "really so"simple now?}, [ q{is}, q{"really so
+"simple}, q{now?} ] ],
[ q{is this'really so'simple now?}, [ q{is}, q{this'reall
+y so'simple}, q{now?} ] ],
[ q{is this'really so' now?}, [ q{is}, q{this'reall
+y so'}, q{now?} ] ],
[ q{is 'really so'simple now?}, [ q{is}, q{'really so
+'simple}, q{now?} ] ],
[ q{is really\\ so\\ simple now?}, [ q{is}, q{really\\ so
+\\ simple}, q{now?} ] ],
);
my $regex = qr/(?>
'(?> \\. | . )*?' # single quoted string
|
"(?> \\. | . )*?" # double quoted string
|
['"].* # unmatched quote
|
\\. # escaped character
|
\S # single non-space character
)+/x;
my $passcount = 0;
for ( @tests )
{
my ( $string, $want ) = @$_;
my @out = $string =~ /$regex/g;
local $" = "\0"x5; # just some array element boundary separator
"@$want" eq "@out" ? $passcount++ :
dd "$string => FAILED got", \@out, ' wanted ', $want;
}
print "$passcount of @{[scalar @tests]} passed\n";
25 of 25 passed
| [reply] [d/l] [select] |
|
|
BTW, on the no-backtracking -- that was a later addition one of about 10-15 alterations in the statement I tried over time.
| [reply] |
Re^2: solution wanted for break-on-spaces (w/specifics)
by perl-diddler (Chaplain) on Oct 26, 2021 at 16:25 UTC
|
Your regex was perfect. FWIW, I put it in my original prog (some bugs fixed in the prog), as the 2nd regex in the regex array. The reason I had them and the outputs in arrays was to compare
several RE's. But I ended up with just the one as it passed the most cases.
So lines for cases 3 and 4 (w/4+5 being the two that didn't pass in the regex I originally posted)
ResByLn:{ln=>3, wanted=>4, got=>[4, 4]},[" p ", " p "]
ResByLn:{ln=>4, wanted=>2, got=>[3, 2]},["FAIL:<4>", " p "]
The gots were count I got from the regex's, with your RE being in the 2nd position. The last brackets contained the p/f for each regex against that statement. So yours were 'p' straight down the 2nd column. Thanks. I had spaces in the earlier revisions of the re's, but I wasn't
sure I had the 'x' flag applied to the sub-re's that needed them.
I guess each outer layer of the RE's flags get propagated to inner RE's.
I'm not sure if you were asking a question about your third group above where it you wrote: " 'what about these questionable cases?',"? I'm not sure what is questionable about them. In my use case,
neither 'q{}' nor '?' have special meaning. Only the quotes and backslash were meta chars. So in the first line, I see 3 fields in both of the 1st 2 cases:
[ q{is this"really so"simple now?}, [ q{is}, q{this"really so"simple},
+ q{now?} ] ],
^ ^ ^
+^
Both of expressions had 2 breaks -- yielding 3 parts in each. Does that make sense?
One rule I forgot to list, though, at least your example handled it as expected, was
what to do with overlapping quotes, and not making a quote of a different type have 'meta' properties. I.e.:
this "is a' test" of weird' stuff
I may be wrong but I don't think most here would split that into 3 parts, as most of us are used to meta-properties of characters being disabled or modified within quotes, so the single quote above wouldn't start a quoted sub-expression overlapping with double quoted part. That would effectively make "is a' test" of weird' all 1 "word" as all the spaces are between quotes of some type. While that would be "a" way of interpreting overlapping quoted sections, I don't know how expected or useful it would be.
Need to study your example and some others, but wanted to make some response. Just that about 3-4 other things cropped up and need attention just after I posted this...
| [reply] [d/l] [select] |
|
|