Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Simple way to parse whitespace separated, optionally quoted words out of a string

by Aristotle (Chancellor)
on Nov 16, 2002 at 02:09 UTC ( [id://213340]=CUFP: print w/replies, xml ) Need Help??

I recently needed to parse words separated by whitespace out of a string, where a doublequoted word may contain spaces itself but no escaped doublequotes. The following regex will do, is quite tidy and also hardly backtracks (occasionally by a few characters, not more).
my @words = /"?((?<!")\S+(?!"\s)|[^"]+)"?\s*/g;
  • Comment on Simple way to parse whitespace separated, optionally quoted words out of a string
  • Download Code

Replies are listed 'Best First'.
Re: Simple way to parse whitespace separated, optionally quoted words out of a string
by japhy (Canon) on Nov 16, 2002 at 15:30 UTC
    Your method has its flaws. It does not require proper balancing of quotes, and it breaks q{"oops"I did it again"} into q{oops} and q{I did it again} -- that is, two strings, where it should be five. It also returns q{""} for the string q{""}, when it should probably return just an empty string (q{}).

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      Thanks for your points. The latter was easy to fix, but the first one caused me a fair bit of work and I wasn't able to come up with a solution that worked with only a single capturing pair of parens (my original goal). The following shouldn't fail regardless of how pathological the case you throw at it gets, but requires a grep.
      my @word = do { my $i=0; grep $i++&1, m/ (")? ((?(1) [^"]* | \S+)) \1\s* /xg };
      Correct, but not nearly as neat as before. :-/ It does have the advantage that it can easily be extended to accept single quotes as well, though.

      Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://213340]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-26 07:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found