There is a standard solution to this problem, mostly from Mastering Regular Expressions:
# You need to match a double quoted string with the following regex
# [^"\\]*(\\.[^"\\]*)*",?
#
# But to get the text between double quotes use some ( )
# ([^"\\]*(\\.[^"\\]*)*)",?
# gets text inside quotes as $1
#
# but you also have non quoted fields, thus
# ([^,]+),?
# which should match things optionally followed by a comma
#
# and then a match for separation commas
# ,
#
# this must be repeated with m/.../g
Before attempting this yourself, take at look at Text::ParseWords and the quotewords routine. This should solve your problem. If the module is not available to you then the following untested code from Mastering Regular Expressions should work:
@fields = ();
while ($text =~ m/"([^"\\]*(\\.[^"\\]*)*)",?|([^,]+),?|,/g {
push (@fields, defined ($1) ? $1 : $3) ;
}
push (@fields, undef) if $text =~ m/,$/; # Account for the special cas
+e of an empty last field.
# all data is now in @fields
Note: untested.
SciDude
The first dog barks... all other dogs bark at the first dog.