There is a standard solution to this problem, mostly from Mastering Regular Expressions:
# You need to match a double quoted string with the following regex
# [^"\\]*(\\.[^"\\]*)*",?
#
# But to get the text between double quotes use some ( )
# ([^"\\]*(\\.[^"\\]*)*)",?
# gets text inside quotes as $1
#
# but you also have non quoted fields, thus
# ([^,]+),?
# which should match things optionally followed by a comma
#
# and then a match for separation commas
# ,
#
# this must be repeated with m/.../g
Before attempting this yourself, take at look at Text::ParseWords and the quotewords routine. This should solve your problem. If the module is not available to you then the following untested code from Mastering Regular Expressions should work:
@fields = ();
while ($text =~ m/"([^"\\]*(\\.[^"\\]*)*)",?|([^,]+),?|,/g {
push (@fields, defined ($1) ? $1 : $3) ;
}
push (@fields, undef) if $text =~ m/,$/; # Account for the special cas
+e of an empty last field.
# all data is now in @fields
Note: untested.
SciDude
The first dog barks... all other dogs bark at the first dog.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.