Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

A nice little question for the insane

by Dragon_Magi (Initiate)
on Apr 06, 2005 at 02:07 UTC ( #445148=perlquestion: print w/ replies, xml ) Need Help??
Dragon_Magi has asked for the wisdom of the Perl Monks concerning the following question:

I have this interesting little regex for parsing a command line and I was wondering if anyone had any suggestions or oversights I might have made to mention. Posted for less than 10 seconds before I thought to mention that the line of text could contain double quoted peices that are going to be treated as one whole peice.
(?:([^\"\s]+)|\s+|("[^\"\\]*(?:\\.[^\"\\]*)*"))+
Its been a while since I've used an ever somewhat complex expression so I thought there might be a few things I missed.

Comment on A nice little question for the insane
Download Code
Re: A nice little question for the insane
by dragonchild (Archbishop) on Apr 06, 2005 at 03:05 UTC
    Don't use a regex?

    Seriously, parsing something that has balanced anythings with regular expressions is notoriously hard to get right, especially if you want to be able to report back exactly where your user went wrong. They're more of a yes-or-no kind of deal.

    If you want to parse a command line, it's better to use an actual parsing routine. The one in tilly's Text::xSV should be very close to what you're looking for. Actually, you could probably use Text::xSV with some very slight modifications. Well, one modification - you need to be able to specify the SEP as a regex of \s+.

      That's probably the best answer, IMHO. Here's a more hackish approach:
      chomp( @a = qx( for x in $_; do echo \$x; done ) );
      There's already a tool that knows how to parse command lines! :-)

      Update: fixed bugs in code, which was untested. Thanks, tlm!

      Another option is the core module Text::ParseWords, which includes a shellwords function that likely does just the right thing.

Re: A nice little question for the insane
by ysth (Canon) on Apr 06, 2005 at 03:15 UTC
    Just to break it up a little, YAPE::Regex::Explain produces this:
    The regular expression: (?-imsx:(?:([^"\s]+)|\s+|("[^"\\]*(?:\\.[^"\\]*)*"))+) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?: group, but do not capture (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- [^"\s]+ any character except: '"', whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- | OR ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- " '"' ---------------------------------------------------------------------- [^"\\]* any character except: '"', '\\' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- (?: group, but do not capture (0 or more times (matching the most amount possible)): ---------------------------------------------------------------------- \\ '\' ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- [^"\\]* any character except: '"', '\\' (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- )* end of grouping ---------------------------------------------------------------------- " '"' ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- )+ end of grouping ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://445148]
Approved by moot
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2014-12-20 17:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (97 votes), past polls