Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: regex to capture an unsigned decimal value, but also preserving the user's formatting.

by davido (Cardinal)
on May 04, 2016 at 05:57 UTC ( [id://1162143]=note: print w/replies, xml ) Need Help??


in reply to regex to capture an unsigned decimal value, but also preserving the user's formatting.

You can make the dot possessive and the trailing digits greedy but optional:

while(<DATA>){ chomp; next unless length; say "$_ => [", (m/(\d*\.?+\d*)/x ? "$1]" : ']'); } __DATA__ 0.12 .12 12. 12

This yields the following output:

0.12 => [0.12] .12 => [.12] 12. => [12.] 12 => [12]

I do not know what additional edge cases you might encounter that would break this. But for your test cases it works.

Two tricks here. First, it was a mistake to use the + quantifier on the trailing \d, because then it became a required portion of the match. If it's required, then 12. is going to fail to match the decimal point because it comes before the trailing \d+. By making the quantifier * we stay greedy, but optional.

The next trick is the ?+ quantifier for the decimal point. The + here signifies to be possessive -- once it has matched, hold onto what it matched against, and don't give it up during backtracking.

For reference, here are the two regexes (yours and mine) in close proximity. You can disregard the parens here; I'm only using them for capturing, which you were achieving by referring to the match hash.

m/( \d* \.?+ \d* )/x # Mine m/( \d* \.? \d+ )/x # Yours

I suggest walking through your original regular expression, and the one I've provided using the Regexp::Debugger's rxrx utility. I think my inadequate description will be clearer once you see the wheels in motion.

Update: I've fixed the backslashing of the . in the pattern.


Dave

  • Comment on Re: regex to capture an unsigned decimal value, but also preserving the user's formatting.
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: regex to capture an unsigned decimal value, but also preserve the user's formatting.
by Athanasius (Archbishop) on May 04, 2016 at 06:41 UTC

    Hello davido,

    Your use of the + (possessive) quantifier is ingenious, as it leads to simple, elegant code. I think you need to backslash the dot to avoid matching any character:

    say "$_ => [", (m/(\d*\.?+\d*)/x ? "$1]" : ']'); # ^

    But my main quibble is that this doesn’t work consistently if the decimal is embedded in a longer string. (Whether that’s actually a requirement isn’t clear from the OP.) For that, I came up with the following regex, which is verbose but seems to work OK:

    #! perl use strict; use warnings; use feature qw( say ); my $decimal = qr{ ( (?: \d+ \.? \d* ) | (?: \d* \.? \d+ ) ) }x; while (<DATA>) { chomp; next unless length; say "$_ => [", (/$decimal/x ? "$1]" : ']'); } __DATA__ 0.12 .12 12. 12 no numbers here abc.def42 .7zx

    Output:

    16:39 >perl 1621_SoPW.pl 0.12 => [0.12] .12 => [.12] 12. => [12.] 12 => [12] no numbers here => [] abc.def42 => [42] .7zx => [.7] 16:39 >

    Cheers,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      I think the other important point to make is that the order of the sub-patterns in the ordered alternation is critical:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $ufp = qr{ \d* [.]? \d+ | \d+ [.]? \d* }xms; ;; for my $s (qw(0.12 .34 56. 78)) { printf qq{'$s' -> }; printf qq{'$1' \n} if $s =~ m{ ($ufp) }xms; } ;; my $s = '0.12 -0.98 .34 -.76 56. -54. 78 -32 bla bla abc.def42 .7zx'; printf qq{'$1' } while $s =~ m{ ($ufp) }xmsg; " '0.12' -> '0.12' '.34' -> '.34' '56.' -> '56' '78' -> '78' '0.12' '0.98' '.34' '.76' '56' '54' '78' '32' '42' '.7'
      (See  '56.' and  '-54.' instances.)


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1162143]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-26 08:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found