Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Splitting up quoted/escaped command line arguments

by kennethk (Abbot)
on Feb 11, 2014 at 18:50 UTC ( #1074467=note: print w/replies, xml ) Need Help??

in reply to Splitting up quoted/escaped command line arguments

Okay, so given all the qualifiers about how this cannot be robust and that there are all sorts of potential security implications (which is probably why Argv is so complex), you could take 1 of two approaches:
  1. State machine. Crawl the string character by character, keeping track of things like if you opened with a single quote, last saw an equals sign or backslash... Start out with a for (split //) {..., and stash the characters on a buffer. The buffer could be either an independent scalar or $args[-1], depending on taste.

  2. Regular expression with backreferences. This is more challenging, because regular expressions aren't really intended to split up an entire string, but rather grab substrings. Expressions like "[^"]*(?<!\\)" to grab everything between two unescaped double quotes could be helpful, but remember if the command were echo "He said, \"How are you?\"", the intended output from your process would be ($command, @args) = ('echo', 'He said, "How are you?"'), which requires removing the surrounding quotes as well as unescaping.

Note as well there is already a bug with my ( $command, $argstring ) = split / /, $string, 2; in the case where the executable path contains a space. I personally would go with the state machine; logic is more natural and quiet failures are less common in my experience. It will still require the kind of unescaping discussed with 2). Actually, I would probably just use a string exec, since someone already did a lot of work developing a shell, but that's not on spec.

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Replies are listed 'Best First'.
Re^2: Splitting up quoted/escaped command line arguments
by Tommy (Chaplain) on Feb 11, 2014 at 18:57 UTC

    Without knowing the name for it, I have already started trying to put together a state machine. I called it a "peel off" approach where I look through the string and peel things off one at a time, making sure to note and handle quoted things when I encounter them. I haven't got very far with it yet--just a few minutes working on the idea.

    A mistake can be valuable or costly, depending on how faithfully you pursue correction

      OK. This mixture of approaches seems to be working so far: I haven't been able to break it yet. Can anyone break this?

      (Please see UPDATE 2 to the OP)

      A mistake can be valuable or costly, depending on how faithfully you pursue correction
        'a'\''b' Seems to break update 4.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1074467]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2017-01-23 15:25 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (192 votes). Check out past polls.