http://www.perlmonks.org?node_id=1080299


in reply to Re^2: Suggestions to make this code more Perlish
in thread Suggestions to make this code more Perlish

Thanks for that Damian. I'm not really across Perl6 syntax. I looked in Perl6 Regexes documentation; unfortunately, there's several sections with nothing more than "TODO", including "Alternation" and "Grouping and Capturing", so I pretty much gave up at that point. Can you suggest a better source of documentation?

Anyway, inspired by your "shorter and more Perl6-ish version", here's a shorter and more Perl5-ish version of my original (this replaces the while loop, everything else remains the same):

my $re = qr{(?:"(?<a>[^"]*)"|(?<a>[^,]*))(?:,|\000)}; print $tff_fh $_ for map { chomp; s/$re/$+{a}\037/g; $_ } <$csv_fh>;

Due to the issue described in "Repeated Patterns Matching a Zero-length Substring", I was getting '\037\037' (at the end of $_) after each 's///g': hence the 's/[\037]+$//;' to remove them.

I found that by replacing ',?' with '(?:,|\000)', I got zero '\037' characters after the 's///g' (so the 's/[\037]+$//;' wasn't needed at all). [Note: '(?:,|)', '(?:,|$)', '(?:,|\z)' and '(?:,|\Z)' all produced '\037\037' after each 's///g'.]

While I suspect this has something to do with '\0' terminated strings in C, I don't fully understand what's happening. As it could be a side effect that might behave differently in another Perl version (I'm using v5.18.1), and not being able to answer the inevitable "How does this work?" question, I left it out of my original solution.

You, or someone else, may have a quick answer. If not, I was planning to spend a bit more time looking into this and, in the absence of finding a solution, post a more generalised example with a question later in the week.

-- Ken