http://www.perlmonks.org?node_id=1058746

{}think has asked for the wisdom of the Perl Monks concerning the following question:

Monks,
I am trying to edit a CSV string, where my goal is to insert a character (a zero) between any occurrence of two consecutive commas. In other words, I'm replacing any missing values with a zero, e.g.,
1,2,,4 -> 1,2,0,4
The regex I have written does not work as I expected, so I have two questions:
1) What regex would accomplish this, and
2) What is wrong with my approach. There's some subtlety of regexs that I need to familiarize myself with!

This code:

$s="1,2,3,,5,6,,,9,10,,,,14,15,,,,,,,,,,,,,"; $s=~s/,,/,0,/g; 
print "$s\n"; '
...results in :
1,2,3,0,5,6,0,,9,10,0,,0,14,15,0,,0,,0,,0,,0,,0,,

Thanks for considering this issue!
{}think; #Think outside of the brackets

Replies are listed 'Best First'.
Re: Regex to replace consecutive tokens
by johngg (Canon) on Oct 18, 2013 at 12:30 UTC

    Use look-arounds, that way your regex does not consume the next comma.

    $ perl -E ' $str = q{1,2,3,,5,6,,,9,10,,,,14,15,,,,,,,,,,,,,}; $str =~ s{(?<=,)(?=,)}{0}g; say $str;' 1,2,3,0,5,6,0,0,9,10,0,0,0,14,15,0,0,0,0,0,0,0,0,0,0,0,0,

    I hope this is helpful.

    Update: If you want the empty last field replaced as well then add an alternation to the look-ahead.

    $ perl -E ' $str = q{1,2,3,,5,6,,,9,10,,,,14,15,,,,,,,,,,,,,}; $str =~ s{(?<=,)(?=,|\z)}{0}g; say $str;' 1,2,3,0,5,6,0,0,9,10,0,0,0,14,15,0,0,0,0,0,0,0,0,0,0,0,0,0

    Cheers,

    JohnGG

Re: Regex to replace consecutive tokens
by Corion (Patriarch) on Oct 18, 2013 at 12:44 UTC

    An alternative is the sledgehammer approach of retrying until there is nothing more:

    1 while $s =~ s/,,/,0,/;

    At least in this case, it's not hard to see that this approach will need at most two runs to replace all occurrences properly. Other, more complex regular expressions might need more thought, especially when the strings grow long. Rescanning a string might be more expensive than picking up a bit before where you last left off.

      Adding /g might reduce the number of iterations.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Sledgehammers are so cool....no muss, no fuss. Just pick it up, give it a swing, and watch your problem shatter... .

      ++sledgehammers

      ++ Corion


      ...the majority is always wrong, and always the last to know about it...
      Insanity: Doing the same thing over and over again and expecting different results.
        Sledgehammers are so cool.... give it a swing, and watch your problem shatter... .

        Just make sure that it hasn't got any Miley spittle on it.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Regex to replace consecutive tokens
by hdb (Monsignor) on Oct 18, 2013 at 12:32 UTC

    If you have three or more consecutive commas, then the second one in your regular expression is already consumed by the expression but it is needed as the first one for the next match. So you need a look-ahead assertion instead:

    $s=~s/,(?=,)/,0/g;

    that matches the comma but does not consume it.

    Update: This would create a trailing comma if there was a trailing comma. If you want a 0 instead, use $s=~s/,(?=,|$)/,0/g;.

Re: Regex to replace consecutive tokens
by Bloodnok (Vicar) on Oct 18, 2013 at 13:00 UTC
    Yeah, I know it's not a regex answer, but you could put Text::CSV to use if grokking regex's carries less weight than getting the job done.

    Just a thought ...

    A user level that continues to overstate my experience :-))
Re: Regex to replace consecutive tokens
by roboticus (Chancellor) on Oct 18, 2013 at 13:19 UTC

    {}think:

    Yet another way to do it:

    $s = join(",", map {$_ ne '' ? $_ : '0'} split /,/,$s,-1);

    Update: Added '-1' to split to ensure we don't lose trailing commas, as hdb mentions. (Also corrected typo $t-->$s.)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      This way you will lose all trailing commas...

      $s="1,2,3,,5,6,,,9,10,,,,14,15,,,,,,,,,,,,,,," will turn into $s="1,2,3,0,5,6,0,0,9,10,0,0,0,14,15".

        Easy to fix:
        my $t = '1,2,3,,5,6,,,9,10,,,,14,15,,,,,,,,,,,,,,,'; my $s = join ',' => map { $_ ne '' ? $_ : '0' } split /,/ => $t, 1 + $t =~ y/,//;

        Update: Or, maybe a bit more readable

        my $s = join ',' => map $_ || 0, split /,/ => $t, 1 + $t =~ y/,//;
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Regex to replace consecutive tokens
by oiskuu (Hermit) on Oct 18, 2013 at 22:54 UTC
    If all values are alphanumeric, you can use this:
    $s =~ s/,\B/,0/g;

    Update. Here's a rx that handles leading and trailing commas, empty strings:

    $s =~ s/(^|,)\K(?![^,])/0/g;