http://www.perlmonks.org?node_id=1080319


in reply to Re^3: Suggestions to make this code more Perlish
in thread Suggestions to make this code more Perlish

Hi Ken,

The best place to read up about Perl 6 regexes is the specification itself.

You mused:

While I suspect this has something to do with '\0' terminated strings in C, I don't fully understand what's happening.

No, it's not anything to do with C string terminators.

The problem with your previous version was that you were matching an optional comma at the end of each field and then replacing it with a definite "\037" every time. So, for the last field in each record (which, of course, isn't followed by a comma), your were nevertheless appending an unwanted "\037".

The global substitution would then loop one last time, matching a final zero-character field (because of the (?<a>[^,]*) alternative, which can match nothing). The substitution on that empty field then causes a second unnecessary "\037" to be appended.

You could fix that by rewriting your original version something like this:

open my $csv_fh, '<', 'input.csv'; open my $tff_fh, '>', 'output.tff'; my $field = qr{ " (?<field> [^"]* ) " | (?<field> [^,"]* ) }x; while (my $line = <$csv_fh>) { $line =~ s{ $field (?<comma> ,?) } { $+{field} . ($+{comma} && chr 31) }gxe; $line =~ s{\n}{chr 30}xe; print {$tff_fh} $line; }

This version still matches the optional comma each time, but now only appends a "\037" if there actually was a comma. Which means there are no extras to remove, once the line is complete.

Note that I also removed the chomp and replaced it with an explicit substitution of the trailing newline. I felt that this highlights the transformation more clearly than did your clever (but subtle and "at-a-distance") use of $\.

Damian