Re^2: split string by comma

That regular expression is way too posessive. Think about how that would parse

1,"foo",2,"bar, joy",3,3.14,pi,π

Correct regular expressions have been posted in this thead, but when dealing with real CSV data (what about embedded newlines?), you will most likely end up with failure eventually when sticking to split or regular expressions. Please seriously consider using Text::CSV_XS or Text::CSV (which will use Text::CSV_XS when installed) and be done with it.

Another thing seldom considered by US users is that the "." in those "values" is locale dependent. Consider what will happen if 3623494.92 is printed as 3,623,494.92 or printed/exported in Dutch local using both radix sep and triad sep from the locale. It will export as "3.623.494,92". Oh, the horror in "fixing" all those regular expressions :)

Enjoy, Have FUN! H.Merijn

Comment on Re^2: split string by comma Select or Download Code

Replies are listed 'Best First'.

Re^3: split string by comma
by Neighbour (Friar) on Jan 11, 2012 at 08:36 UTC

    my $old_INPUT_RECORD_SEPARATOR = $/;
    $/ = $self->record_delimiter;
    open (DELIMFILE, '<', $filename) or (Carp::confess("Cannot open fi
+le [$filename]: $!"));
    my $record;
    while (<DELIMFILE>) {
        chomp;
        $record = $_;
        # If a line contains an odd amount of doublequotes ("), then w
+e'll need to continue reading until we find another line that contain
+s an odd amount of doublequotes.
        # This is in order to catch fields that contain recordseparato
+rs (but are encased in ""'s).
        if (grep ($_ eq '"', split ('', $_)) % 2 == 1) {
            # Keep reading data and appending to $record until we find
+ another line with an odd number of doublequotes.
            while (<DELIMFILE>) {
                $record .= $_;
                if (grep ($_ eq '"', split ('', $_)) % 2 == 1) { last;
+ }
            }
        } ## end if (grep ($_ eq '"', split...))
        push (@{$ar_returnvalue}, ReadRecord($self, $record));
    } ## end while (<DELIMFILE>)
    close (DELIMFILE);
    $/ = $old_INPUT_RECORD_SEPARATOR;
[download]

my $field_value;
my $delimiter = $self->field_delimiter;
while ($inputstring) {
    undef $field_value;
    if ($inputstring =~ /^"/) {
        $field_value = $inputstring;
        if ($inputstring =~ /^"(([^"]|"")+)"(?:[$delimiter]|$)/p) {
            ($field_value, $inputstring) = ($1, ${^POSTMATCH});
            # Unescape escaped quotes
            $field_value =~ s/""/"/g;
        } else {
            Carp::confess("Parsing error with remaining data [$inputst
+ring]");
        }
    } else {
        $field_value = $inputstring;
        if ($inputstring =~ /^([^$delimiter"]*)(?:[$delimiter]|$)/p) {
            ($field_value, $inputstring) = ($1, ${^POSTMATCH});
        }
    } ## end else [ if ($inputstring =~ /^"/)]
}
[download]

[reply]
[d/l]
[select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks