I'm writing code to parse a simplish string that has components like:
bareword
"quoted string"
'quoted string'
bareword = bareword
bareword = "quoted string"
bareword = 'quoted string'
and can also have a separating ':' somewhere in there.
My code for this currently looks like this:
while ($tag =~ m{
(?:
Update:
while ($tag =~ m{
\G
(?:
( \w+ )
(?:
\s* = \s*
(?: ( \w+ ) | ' ([^']*) ' | " ([^"]*) " )
)?
|
' ([^']*) '
|
" ([^"]*) "
|
( : )
)
(?= \s | \z) \s* )
}gcxs) {
push @args, defined($5) ? $5 # 'quoted string'
: defined($6) ? $6 # "quoted string"
: defined($7) ? $7 # :
: defined($2) ? [ $1, $2 ] # bareword=bareword
: defined($3) ? [ $1, $3 ] # bareword='quoted string'
: defined($4) ? [ $1, $4 ] # bareword="quoted string"
: $1 # bareword
;
}
but that feels like an ugly way to do things - there's duplication of chunks of the pattern, and all those assumptions about the capture numbering. Surely there must be a better way to do this?
Hugo
Update per author - dvergin 2003-02-21