Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Syntactic Confectionery Delight
 
PerlMonks  

regexp for adding commas to a number

by Kozz (Friar)
on Aug 17, 2000 at 20:36 UTC ( #28331=perlquestion: print w/ replies, xml ) Need Help??
Kozz has asked for the wisdom of the Perl Monks concerning the following question:

[disclaimer: this post is sure to demonstrate my ignorance of regular expressions & substitutions]

I'd already seen vroom's Q&A node about adding commas to a number, but it seemed there ought to be an easier way. I thought that a regexp like this one would do the trick:
$number=1234567; # with commas, should be "1,234,567" $number=~s/(\d)(\d{3})\b/$1\,$2/g;
However, this code will actually change $number to 1234,567. Despite the "g" on the end of the regexp, it still sort of works from beginning-to-end, so it only inserts the comma at the end.
Then I thought, "well, I could do this once for each comma" which would work like this:
while($number=~s/(\d)(\d{3})\b/$1\,$2/g){ # nothing here -- how silly is this? }
So then, obviously, the while loop continues as long as the substitution was successful. But is this terribly silly? Could my same code be modified slightly to work correctly with ONE simple substitution? Or am I simply better off using vroom's sub at the aforementioned node?

Comment on regexp for adding commas to a number
Select or Download Code
RE: regexp for adding commas to a number
by KM (Priest) on Aug 17, 2000 at 20:40 UTC
    sub commify { local($_) = shift; 1 while s/^(-?\d+)(\d{3})/$1,$2/; return $_; }

    Cheers,
    KM

RE: regexp for adding commas to a number
by merlyn (Sage) on Aug 17, 2000 at 20:42 UTC
    That's not a working regex. The best ones work something like this:
    $number=1234567; # with commas, should be "1,234,567" $number =~ s/(\d)(?=(\d{3})+(\D|$))/$1\,/g;
    Notice the missing inner repetition in the previous post? And I'm doing this with lookahead, so I can scan from left to right. The "1 while ..." solutions scan effectively from right to left, so they may be slower. Actually, I'd be interested in the various benchmarks on these. {grin}

    -- Randal L. Schwartz, Perl hacker

      Thanks for the insight. I'm ignorant of the secrets of lookahead, lookbehind, inner repetitions, and all the other associated voodoo with things like this, so while this regex certainly works, the breakdown of what it all means is a mystery to me. Would "Mastering Regular Expressions" be a good teacher for this sort of thing?

      Thanks, merlyn (++), and everybody else for their help and insight.
        The currently available Mastering Regular Expressions doesn't cover any of the really cool Perl 5 regex stuff. Jeffrey Friedl is in the process of rewriting the book for a second edition, and has been working with the Perl developers to uncover inconsistencies in the implementation (what normal people would call "bugs" {grin}) and gaps in the documentation.

        Don't hold your breath though. I know this effort will probably take one to two years of nights and weekends. "Been there Done that" x $n

        -- Randal L. Schwartz, Perl hacker

      great regex! me too have forgoten about the look-ahead assertion :( my function that does number beautifying has no less than 8 lines :((
      still.. there's a little problem when handling floats: the digits after the dot shouldn't be 'beautified' :(
      .. i tried to enhance a little your line but the problem still remains because of the fixed-width look-behind:

      $number =~ s/(?<!\.\d)(\d)(?=(\d{3})+(\D|$))/$1,/g;
      (this example works for numbers with 5 digits after the dot... variations may me done by modifying the 'quantity' of \d from the look-behind assertion)

      couldn't think at anything better now.. maybe you have another bright idea for this too o=)

      so, after inserting your wizcraft, my lame tool looks something like this:

      $number =~ s/(\d+)(\.\d+)?/bea_int($1).$2/eg; sub bea_int { my $kk = $_[0]; $kk =~ s/(\d)(?=(\d{3})+(\D|$))/$1\,/g; return $kk; }

      --
      AltBlue... w8ing 4 a better solution o=Q

        The easy way to deal with open ended floating pointed nums is to simply remove the floating part until the commas are added, then reattach.
        $integer =~ s/(.*)(\.\d\d+)$/$1/; $float = $2; # do stuff $integer .= $float;
RE: regexp for adding commas to a number
by Adam (Vicar) on Aug 17, 2000 at 20:56 UTC
    One Benchmark:
    use strict; use Benchmark; for ( 1..5 ) # Do five tests. { $_ = int( rand(10_000) ) ** int( rand(3) + 2 ); print $_, "\n"; timethese( 1_000_000, { 'KM' => sub { 1 while s/^(-?\d+)(\d{3})/$1,$2/ }, 'Merlyn' => sub { s/(\d)(?=(\d{3})+(\D|$))/$1\,/g } }); print "\n", "- " x 39, "-\n"; }
    Results:
    82755409
    Benchmark: timing 1000000 iterations of KM, Merlyn...
            KM:  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 958772.77/s
    (n=1000000)
        Merlyn:  1 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 2036659.88/s
     (n=1000000)
    
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    293198635825936
    Benchmark: timing 1000000 iterations of KM, Merlyn...
            KM:  1 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 949667.62/s
    (n=1000000)
        Merlyn:  0 wallclock secs ( 0.49 usr +  0.00 sys =  0.49 CPU) @ 2036659.88/s
     (n=1000000)
    
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    602425897921
    Benchmark: timing 1000000 iterations of KM, Merlyn...
            KM:  0 wallclock secs ( 1.05 usr +  0.00 sys =  1.05 CPU) @ 949667.62/s
    (n=1000000)
        Merlyn:  0 wallclock secs ( 0.47 usr +  0.00 sys =  0.47 CPU) @ 2123142.25/s
     (n=1000000)
    
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    1.80935247108226e+015
    Benchmark: timing 1000000 iterations of KM, Merlyn...
            KM:  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 958772.77/s
    (n=1000000)
        Merlyn:  1 wallclock secs ( 0.46 usr +  0.00 sys =  0.46 CPU) @ 2169197.40/s
     (n=1000000)
    
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    106294343553
    Benchmark: timing 1000000 iterations of KM, Merlyn...
            KM:  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 958772.77/s
    (n=1000000)
        Merlyn:  1 wallclock secs ( 0.48 usr +  0.00 sys =  0.48 CPU) @ 2083333.33/s
     (n=1000000)
    
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    
      Nice, but the real benchmark is testing them inside much larger strings:
      "12123123 sadljaskjdfl skadj flkasjdf lksadjf klsadjfasdk 12718237192 +378"
      since both regex were designed to work with large strings of sequences of digits in various places within the string.

      -- Randal L. Schwartz, Perl hacker

        For you merlyn, I made it run on a string. Plus, I ran it on a faster machine since mine is busy.
        use strict; use Benchmark; for ( 1..5 ) # Do five tests. { $_ = int( rand(10_000) ) ** int( rand(3) + 2 ); $_ = "For $_ Merlyn " . reverse($_) . " plus the constants ". "8634641234541275032000523 and 8,634,641,234,541,275,032,000, +523"; print $_, "\n"; timethese( 1_000_000, { 'KM' => sub { 1 while s/^(-?\d+)(\d{3})/$1,$2/ }, 'Merlyn' => sub {s/(\d)(?=(\d{3})+(\D|$))/$1\,/g} }); print "\n", "- " x 39, "-\n"; }
        Output:
        For 2699449 Merlyn 9449962 plus the constants 8634641234541275032000523 and 8,63
        4,641,234,541,275,032,000,523
        Benchmark: timing 1000000 iterations of KM, Merlyn...
                KM:  1 wallclock secs ( 1.04 usr +  0.00 sys =  1.04 CPU) @ 960614.79/s
        (n=1000000)
            Merlyn:  0 wallclock secs ( 0.51 usr +  0.00 sys =  0.51 CPU) @ 1956947.16/s
         (n=1000000)
        
        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        For 78836641 Merlyn 14663887 plus the constants 8634641234541275032000523 and 8,
        634,641,234,541,275,032,000,523
        Benchmark: timing 1000000 iterations of KM, Merlyn...
                KM:  1 wallclock secs ( 1.06 usr +  0.00 sys =  1.06 CPU) @ 942507.07/s
        (n=1000000)
            Merlyn:  0 wallclock secs ( 0.58 usr +  0.00 sys =  0.58 CPU) @ 1721170.40/s
         (n=1000000)
        
        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        For 126128378375 Merlyn 573873821621 plus the constants 863464123454127503200052
        3 and 8,634,641,234,541,275,032,000,523
        Benchmark: timing 1000000 iterations of KM, Merlyn...
                KM:  2 wallclock secs ( 1.02 usr +  0.00 sys =  1.02 CPU) @ 979431.93/s
        (n=1000000)
            Merlyn:  1 wallclock secs ( 0.53 usr +  0.00 sys =  0.53 CPU) @ 1883239.17/s
         (n=1000000)
        
        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        For 8665653464 Merlyn 4643565668 plus the constants 8634641234541275032000523 an
        d 8,634,641,234,541,275,032,000,523
        Benchmark: timing 1000000 iterations of KM, Merlyn...
                KM:  2 wallclock secs ( 1.12 usr +  0.00 sys =  1.12 CPU) @ 891265.60/s
        (n=1000000)
            Merlyn:  0 wallclock secs ( 0.53 usr +  0.00 sys =  0.53 CPU) @ 1886792.45/s
         (n=1000000)
        
        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        For 36300625 Merlyn 52600363 plus the constants 8634641234541275032000523 and 8,
        634,641,234,541,275,032,000,523
        Benchmark: timing 1000000 iterations of KM, Merlyn...
                KM:  2 wallclock secs ( 0.98 usr +  0.00 sys =  0.98 CPU) @ 1018329.94/s
         (n=1000000)
            Merlyn:  0 wallclock secs ( 0.54 usr +  0.00 sys =  0.54 CPU) @ 1848428.84/s
         (n=1000000)
        
        - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
        

        Um, I guess you didn't notice the ^ in KM's. (:

                - tye (but my friends call me "Tye")
Re: regexp for adding commas to a number
by tenatious (Beadle) on Aug 18, 2000 at 07:06 UTC
    Out of Andrew Johnson's Elements of Perl Programming:
    s/(\d{1,3}) (?= (?:\d\d\d)+ (?!\d) ) /$1,/gx;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://28331]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (11)
As of 2014-04-18 09:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (464 votes), past polls