Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Is A Number

by Anonymous Monk
on Dec 17, 2021 at 15:13 UTC ( #11139686=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I realise, this has been queried before but I can't seem to find a simple elegant Perl example. I need to know if a string is a number or not. With trial and error, I have come up with the below that works. This is anything but elegant and has been separated into multiple if / elsif for testing. I would like to combine the string comparisons into as few as possible but seem to break the code every time I try.
print IsNumber("0777 891 777") . "\n"; # 0 print IsNumber("1.5671") . "\n"; # 1 print IsNumber("121A3D") . "\n"; # 0 print IsNumber("777") . "\n"; # 1 print IsNumber("0") . "\n"; # 1 print IsNumber("-4.567") . "\n"; # 1 print IsNumber("+9.8.97") . "\n"; # 0 print IsNumber("+9.897") . "\n"; # 1 print IsNumber("+9.897") . "\n"; # 0 print IsNumber("9.8[97") . "\n"; # 0 sub IsNumber { my ($string) = @_; my $valid = 0; my $count = $string =~ tr/\.//; if ( $string =~ m/[a-zA-Z\ \[\]]/ ) { $valid = 0; } elsif ( $string =~ /[^\x00-\x7F]/ ) { $valid = 0; } elsif ( $count > 1 ) { $valid = 0; } elsif ( $string =~ m/[#@':;><,.{}[]=!"$%^&*()]/ ) { $valid = 0; } elsif ( $string =~ m/^[+-]?\d+$/ ) { $valid = 1; } elsif ( $string =~ m/^[+-]?[0-9]+[.]?[0-9]+/ ) { $valid = 1; } return $valid; }

Replies are listed 'Best First'.
Re: Is A Number
by hippo (Bishop) on Dec 17, 2021 at 15:24 UTC

    If you are just looking for an elegant, ready-made solution then Scalar::Util::looks_like_number fits the bill:

    use strict; use warnings; use Scalar::Util 'looks_like_number'; use Test::More; my @nums = ( '1.5671', '777', '0', '-4.567', '+9.987' ); my @not = ( '0777 891 777', '121A3D', '+9.8.97', '+9.897', '9.8[97' ); plan tests => @nums + @not; for my $i (@nums) { ok looks_like_number ($i), "$i is a number"; } for my $i (@not) { ok ! looks_like_number ($i), "$i is not a number"; }

    🦛

      > If you are just looking for an elegant, ready-made solution then Scalar::Util::looks_like_number fits the bill:

      Even better ... it's core.

      So no need to install anything.

      D:\tmp\pm>corelist Scalar::Util Data for 2021-01-23 Scalar::Util was first released with perl v5.7.3 D:\tmp\pm>

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

Re: Is A Number
by haukex (Archbishop) on Dec 17, 2021 at 20:27 UTC
Re: Is A Number
by kcott (Archbishop) on Dec 17, 2021 at 20:02 UTC

    The suggested Scalar::Util::looks_like_number() is often a good choice; however, be aware of some gotchas:

    Infinity (see "perldata: Special floating point: infinity (Inf) and not-a-number (NaN)"):

    perl -E ' use Scalar::Util "looks_like_number"; for (qw{Inf -Inf Infinity -Infinity}) { say "$_: ", looks_like_number($_) ? 1 : 0; } ' Inf: 1 -Inf: 1 Infinity: 1 -Infinity: 1

    Non-decimal numbers:

    perl -E ' use Scalar::Util "looks_like_number"; for (qw{1 0b10 0o10 0x10}) { say "$_: ", looks_like_number($_) ? 1 : 0 } ' 1: 1 0b10: 0 0o10: 0 0x10: 0
    perl -E ' use Scalar::Util "looks_like_number"; for (1, 0b10, 0o10, 0x10) { say "$_: ", looks_like_number($_) ? 1 : 0 } ' 1: 1 2: 1 8: 1 16: 1
    "I would like to combine the string comparisons into as few as possible but seem to break the code every time I try."

    If looks_like_number() is causing you problems, perhaps due to "gotchas" indicated above, or as a purely academic exercise, here's some hints for a hand-crafted solution.

    Get rid of the blacklist: this will take ages to get right and, even then, there's a high chance that you'll miss an edge-case or two (or more). Just use a whitelist which represents valid numbers for your application.

    Use a single regex with alternations, and only define it once (not in every iteration). With Perl v5.10 or higher, you can use this format:

    sub IsNumber { state $re = qr{...}; return $_[0] =~ $re; }

    For Perls of an older vintage:

    { my $re; BEGIN { $re = qr{...} } sub IsNumber { return $_[0] =~ $re; } }

    Don't try to write your regex in the smallest possible space. This isn't a golfing exercise. Readability and maintainability will very quickly deteriorate. Use the /x modifier — or /xx if available; requires Perl v5.26 or later — as explained in "perlre: /x and /xx".

    Consider whether you're allowing exponents:

    $ perl -E 'say 1e6' 1000000

    What about when underscores have been used to improve the readability of large numbers:

    $ perl -E 'say 1_000_000/1_000' 1000

    Are you only dealing with 7-bit ASCII digits (/[0-9]/) or allowing the handling of all (i.e. Unicode®) digits (/\p{Digit}/)?

    Note that /\d/ and /[[:digit:]]/ are not necessarily the same as /[0-9]/. See "perlre: /a (and /aa)".

    More general sources of reference are: perlre; perlrebackslash; and, perlrecharclass.

    — Ken

Re: Is A Number
by syphilis (Archbishop) on Dec 18, 2021 at 01:26 UTC
    Hi,

    I had a little play and noticed that your IsNumber() sub returns true for integers with embedded (and/or trailing) garbage, unless the garbage is alphabetic or whitespace.
    Seemed odd ... but, of course, FAIK it's quite possible that the integer inputs you're receiving are either guaranteed to have no such garbage or are to be deemed acceptable if they include such garbage.
    For the input array in the following script (where I also compare the IsNumber() and looks_like_number() results), IsNumber returns "1" for all but the final input.
    use strict; use warnings; use Scalar::Util qw(looks_like_number); use Test::More; my @in = ('99/999998', '9999*9998', '9999-9998', '9999+9998', '9999:9998', '9999@:%?', '9999@:%?9998', '9999ABCD9998',); for(@in) { cmp_ok(IsNumber($_), '==', looks_like_number($_), "$_: " . IsNumber +($_)); } done_testing(); sub IsNumber { my ($string) = @_; my $valid = 0; my $count = $string =~ tr/\.//; if ( $string =~ m/[a-zA-Z\ \[\]]/ ) { $valid = 0; } elsif ( $string =~ /[^\x00-\x7F]/ ) { $valid = 0; } elsif ( $count > 1 ) { $valid = 0; } elsif ( $string =~ m/[#@':;><,.{}[]=!"$%^&*()]/ ) { $valid = 0; } elsif ( $string =~ m/^[+-]?\d+$/ ) { $valid = 1; } elsif ( $string =~ m/^[+-]?[0-9]+[.]?[0-9]+/ ) { $valid = 1; } return $valid; }
    Outputs:
    not ok 1 - 99/999998: 1 # Failed test '99/999998: 1' # at try.pl line 13. # got: 1 # expected: not ok 2 - 9999*9998: 1 # Failed test '9999*9998: 1' # at try.pl line 13. # got: 1 # expected: not ok 3 - 9999-9998: 1 # Failed test '9999-9998: 1' # at try.pl line 13. # got: 1 # expected: not ok 4 - 9999+9998: 1 # Failed test '9999+9998: 1' # at try.pl line 13. # got: 1 # expected: not ok 5 - 9999:9998: 1 # Failed test '9999:9998: 1' # at try.pl line 13. # got: 1 # expected: not ok 6 - 9999@:%?: 1 # Failed test '9999@:%?: 1' # at try.pl line 13. # got: 1 # expected: not ok 7 - 9999@:%?9998: 1 # Failed test '9999@:%?9998: 1' # at try.pl line 13. # got: 1 # expected: ok 8 - 9999ABCD9998: 0 1..8 # Looks like you failed 7 tests of 8.
    Cheers,
    Rob

    PS
    I also did not expect that IsNumber('1.123e4') would return 0. But again, I don't know much about the possible inputs or the criteria for assessing them.
Re: Is A Number (updated)
by AnomalousMonk (Archbishop) on Dec 19, 2021 at 06:06 UTC

    I agree with others that considerably simplified versions of your function are available and IMHO preferable.

    I want to comment on one test in your original post. The
        elsif ( $string =~ m/[#@':;><,.{}[]=!"$%^&*()]/ ) { ... }
    test is wrong/useless.

    +---------------------------- start of character class | +--------------- END of character class !!! | | | |+-------------- characters and metacharacters | || from here on | || | || +-------- START-OF-STRING anchor !!! | || | | || | +------ 0-or-more quantifier | || | | | || | |++---- empty capture group | || | ||| | || | |||+--- literal ] character | || | |||| v vv v vvvv m/[#@':;><,.{}[]=!"$%^&*()]/ ^ ^ | | | +--- interpolated scalar $% | +------------------- interpolated array @'
    The premature end of the character class makes the regex nonsense, but most egregiously it produces a ^ start-of-string anchor metacharacter that requires characters before it (i.e., before the start of the string) for a match to occur. Thus, no match can ever occur.

    Beyond that, there are two interpolated global special variables that appear in the originally posted regex and cause it to be other than what one might expect. See perlvar for $%.   @' is the unused array slot of the $' regex special variable typeglob (see also perlvar).

    Win8 Strawberry 5.8.9.5 (32) Sat 12/18/2021 23:35:28 C:\@Work\Perl\monks >perl use strict; use warnings; use Data::Dump qw(dd); my $regex = qr/[#@':;><,.{}[]=!"$%^&*()]/; # as posted pm#11139686 Possible unintended interpolation of @' in string at - line 7. # my $regex = qr/[#\@':;><,.{}[\]=!"\$%^&*()]/; # corrected dd $regex; no warnings 'qw'; for my $c (qw/ # @ ' : ; > < , . { } [ ] = ! " $ % ^ & * ( ) /) { print "'$c' ", $c =~ $regex ? ' ' : 'NO', " match \n"; } ^Z qr/[#:;><,.{}[]=!"0^&*()]/ '#' NO match '@' NO match ''' NO match ':' NO match ';' NO match '>' NO match '<' NO match ',' NO match '.' NO match '{' NO match '}' NO match '[' NO match ']' NO match '=' NO match '!' NO match '"' NO match '' NO match '$' NO match '%' NO match '^' NO match '&' NO match '*' NO match '(' NO match ')' NO match
    Note that @' and $% do not appear in the dd dump of the regex – and where does the 0 come from?

    Fixing the character class termination and suppressing variable interpolation yields the results I think you want:

    Win8 Strawberry 5.8.9.5 (32) Sat 12/18/2021 23:36:37 C:\@Work\Perl\monks >perl use strict; use warnings; use Data::Dump qw(dd); # my $regex = qr/[#@':;><,.{}[]=!"$%^&*()]/; # as posted pm#11139686 my $regex = qr/[#\@':;><,.{}[\]=!"\$%^&*()]/; # corrected dd $regex; no warnings 'qw'; for my $c (qw/ # @ ' : ; > < , . { } [ ] = ! " $ % ^ & * ( ) /) { print "'$c' ", $c =~ $regex ? ' ' : 'NO', " match \n"; } ^Z qr/[#\@':;><,.{}[\]=!"\$%^&*()]/ '#' match '@' match ''' match ':' match ';' match '>' match '<' match ',' match '.' match '{' match '}' match '[' match ']' match '=' match '!' match '"' match '' match '$' match '%' match '^' match '&' match '*' match '(' match ')' match
    Note that the @' and $% literal sequences are in their proper places in the dd dump of the regex.

    Same results running under Strawberry Perl 5.30.3.1 except the Possible unintended interpolation of @' in string ... warning is not emitted.

    Update: If you really need to do something like
        $string =~ m/[#\@':;><,.{}[\]=!"\$%^&*()]/;
    to test for the presence of certain characters, it might be better to use tr (see also in perlop):
        $string =~ tr/#@':;><,.{}[]=!"$%^&*()//;
    (in scalar or boolean context). I suggest tr// because it has fewer features (e.g., variable interpolation, character classes, etc.) than m// or s/// and so is less prone to mistakes and misinterpretation.

    Win8 Strawberry 5.8.9.5 (32) Mon 12/20/2021 17:08:37 C:\@Work\Perl\monks >perl use strict; use warnings; no warnings 'qw'; for my $c (qw/ # @ ' : ; > < , . { } [ ] = ! " $ % ^ & * ( ) 1.2 0 1 2 - + -1 +1 12 +12 -12 /) { my $is_blacklisted = $c =~ tr/#@':;><,.{}[]=!"$%^&*()//; print "'$c'", $is_blacklisted ? '' : ' NOT', " blacklisted \n"; } ^Z '#' blacklisted '@' blacklisted ''' blacklisted ':' blacklisted ';' blacklisted '>' blacklisted '<' blacklisted ',' blacklisted '.' blacklisted '{' blacklisted '}' blacklisted '[' blacklisted ']' blacklisted '=' blacklisted '!' blacklisted '"' blacklisted '' blacklisted '$' blacklisted '%' blacklisted '^' blacklisted '&' blacklisted '*' blacklisted '(' blacklisted ')' blacklisted '1.2' blacklisted '0' NOT blacklisted '1' NOT blacklisted '2' NOT blacklisted '-' NOT blacklisted '+' NOT blacklisted '-1' NOT blacklisted '+1' NOT blacklisted '12' NOT blacklisted '+12' NOT blacklisted '-12' NOT blacklisted
    Also same results under Strawberry Perl 5.30.3.1.


    Give a man a fish:  <%-{-{-{-<

Re: Is A Number
by BillKSmith (Monsignor) on Dec 18, 2021 at 23:25 UTC
    I agree with the other monks that the module is the best solution if you want to know if perl considers your string a number. But I will try to answer your original question. It is actually better to only test for the valid cases, and assume that it is invalid otherwise. The following code is a simplified version of yours.
    use strict; use warnings; use Test::Simple tests => 10; sub IsNumber { (local $_) = @_; return (m/^[+-]?\d+$/ or m/^[+-]?[0-9]+[.]?[0-9]+$/) ; } ok( !IsNumber("0777 891 777")); ok( IsNumber("1.5671") ); ok( !IsNumber("121A3D") ); ok( IsNumber("777") ); ok( IsNumber("0") ); ok( IsNumber("-4.567") ); ok( !IsNumber("+9.8.97") ); ok( IsNumber("+9.897") ); ok( !IsNumber("+9.897") ); ok( !IsNumber("9.8[97") );

    OUTPUT:

    1..10 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7 ok 8 ok 9 ok 10
    Bill

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11139686]
Approved by marto
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2023-03-26 19:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which type of climate do you prefer to live in?






    Results (63 votes). Check out past polls.

    Notices?