in reply to Re^2: Suggestions to make this code more Perlish
in thread Suggestions to make this code more Perlish
Thanks for that Damian. I'm not really across Perl6 syntax. I looked in Perl6 Regexes documentation; unfortunately, there's several sections with nothing more than "TODO", including "Alternation" and "Grouping and Capturing", so I pretty much gave up at that point. Can you suggest a better source of documentation?
Anyway, inspired by your "shorter and more Perl6-ish version", here's a shorter and more Perl5-ish version of my original (this replaces the while loop, everything else remains the same):
my $re = qr{(?:"(?<a>[^"]*)"|(?<a>[^,]*))(?:,|\000)}; print $tff_fh $_ for map { chomp; s/$re/$+{a}\037/g; $_ } <$csv_fh>;
Due to the issue described in "Repeated Patterns Matching a Zero-length Substring", I was getting '\037\037' (at the end of $_) after each 's///g': hence the 's/[\037]+$//;' to remove them.
I found that by replacing ',?' with '(?:,|\000)', I got zero '\037' characters after the 's///g' (so the 's/[\037]+$//;' wasn't needed at all). [Note: '(?:,|)', '(?:,|$)', '(?:,|\z)' and '(?:,|\Z)' all produced '\037\037' after each 's///g'.]
While I suspect this has something to do with '\0' terminated strings in C, I don't fully understand what's happening. As it could be a side effect that might behave differently in another Perl version (I'm using v5.18.1), and not being able to answer the inevitable "How does this work?" question, I left it out of my original solution.
You, or someone else, may have a quick answer. If not, I was planning to spend a bit more time looking into this and, in the absence of finding a solution, post a more generalised example with a question later in the week.
-- Ken
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: Suggestions to make this code more Perlish
by TheDamian (Vicar) on Mar 30, 2014 at 19:43 UTC | |
by kcott (Archbishop) on Mar 31, 2014 at 06:55 UTC | |
by TheDamian (Vicar) on Mar 31, 2014 at 07:40 UTC | |
by kcott (Archbishop) on Mar 31, 2014 at 07:56 UTC |