Extract numbers in multiple bases

by jcwren (Prior)
on Oct 15, 2001 at 00:29 UTC ( #118776=perlquestion: print w/replies, xml ) Need Help??
jcwren has asked for the wisdom of the Perl Monks concerning the following question:

Congratulations to the last person who voted this simple request down. I've had it with the trolls, votebots, and general decline of civilization of PM. I'm outta here.

Updated: Apparently, some people are dim enough to think this is a 'do my homework' type node. It's not. It's code for a future PM-related utility. I'm looking for a little help for something I'd rather not write if it already exists. I think it's called 'Not Reinventing The Wheel'.

I'm looking for some code that will do multi-base number verification. Here's the criteria:

  • A value passed in will have no leading, trailing or embedded spaces.
  • A base qualifier may have a leading '0x' to indicate hex, a trailing 'h' (upper or lower case) to indicate hex, a trailing 'o' (letter, upper or lower case) to indicate octal, or a possible trailing 't' (upper or lower case) to indicate decimal.
  • Numbers with no base qualifiers may be presumed to be decimal.
  • Scientific notation is a valid format, i.e. 14E+12 (the 'E' can be upper or lower case).
  • 0xffffh is not valid as a hex number, since it contains two qualifiers.
  • Hex and octal numbers may not contain decimal points.
  • The routine should return a flag indicating if the passed value was a syntactically valid number according to the rules, the base of the detected number, and the value of the number.
  • Must run under Perl 5.005 or higher. No 5.6.1+ specific solutions are acceptable.

Ideally, some large esoteric module should not be required. For instance, Parse::RecDescent with a grammar for number parsing is a sub-optimal solution.

I don't really care if it's an elaborate (fool-proof) regexp, a state machine, or pirated-and-ported VB code, as long it's solid.

Anyone got one of these handy, or want to golf one?


Replies are listed 'Best First'.
Re: Extract numbers in multiple bases
by MrNobo1024 (Hermit) on Oct 15, 2001 at 02:38 UTC
    sub jcwren { my $num = shift; my($ok, $base, $value); if($num =~ m/^0x([\dA-Fa-f]+)$/s or $num =~ m/^([\dA-Fa-f]+)h$/s) { $ok = 1; $base = 16; $value = hex($1); } elsif($num =~ m/^([0-7]+)o$/s) { $ok = 1; $base = 8; $value = oct($1); } elsif($num =~ m/^([+-]?(?=\d|\.\d)\d*(?:\.\d*)?(?:[Ee](?:[+-]?\d+))? +)t?$/s) { $ok = 1; $base = 10; $value = 0 + $1; } else { $ok = 0; $base = 0; $value = 0; } return($ok, $base, $value); }
(tye)Re: Extract numbers in multiple bases
by tye (Sage) on Oct 15, 2001 at 03:32 UTC
    sub isNumber { local( $_ )= @_; my( $ok, $val, $base ); if( s/h$//i && s/^/0x/ || m/^0x/ ) { $ok= m/^0x[\da-f]+$/i; $val= hex($_); $base= 16; } elsif( s/o$//i ) { $ok= m/^[0-7]+$/; $val= oct($_); $base= 8; } else { my $warn; local( $SIG{__WARN__} )= sub { $warn= $_[0]; }; local( $^W )= 1; $val= 0 + $_; $ok= ! $warn; $base= 10; } $_[1]= $val; $_[2]= $base; return $ok; } my $num= <STDIN>; my $val; if( isNumber( $num, $val, $base ) ) { print "Your number's value is $val in base $base.\n"; } else { print "Your number is invalid but might be close to $val in base $ +base.\n"; }

    Note that I didn't treat a leading zero as octal and I'm a bit inconsistant with how much whitespace I'll ignore.

    Updated: To return the base.

            - tye (but my friends call me "Tye")
(Ovid) Re: Extract numbers in multiple bases
by Ovid (Cardinal) on Oct 15, 2001 at 03:07 UTC

    My first pass at this seems to work okay, though I should probably add a few more tests. Though I have regexes to specify formats, I tried to make it fairly simple to maintain, in case specifications change. There's a test at the top of the code. Just add numbers you want to test to the array to see if they pass or fail.

    #!/usr/bin/perl -w use strict; use Data::Dumper; my @nums = qw/ 0xff 09h 123.25 14E+12 0xfffa 0xffffh 123.25t 777o 123. +23.45 1.2e+13 /; foreach ( @nums ) { my $result = convert_num( $_ ); if ( $result->{ valid } ) { print "$_ is a $result->{base} number with a $result->{value} +decimal value.\n"; } else { print "$_ is not a valid number.\n"; } } sub convert_num { my $num = shift; my %bases = ( hex => { description => [ qw/ ^0x[A-Fa-f0-9]+$ ^[A-Fa-f0-9]+h$ / ], function => sub { ( $_[0] ) = ( $_[0] =~ /(?:0x)?([a-fA +-F0-9]+)/ ); hex( $_[0] ) } }, octal => { description => [ '^[0-7]+o$' ], function => sub { ( $_[0] ) = ( $_[0] =~ /([0-7]+)/ ); +oct( $_[0] ) } }, decimal => { description => [ qw/ ^\d*\.\d+(?!\.)\d*t?$ / ], function => sub { $_[0] =~ s/t$//; $_[0] } }, scientific => { description => [ '^\d*\.?\d+(?!\.)\d*[Ee]\+\d+$' ], function => sub { 1 * shift } } ); my %result = ( valid => 0, base => '', value => $num ) ; foreach my $base ( keys %bases ) { foreach my $regex ( @{ $bases{$base}{description} } ) { if ( $num =~ /$regex/ ) { $result{ valid }++; $result{ base } = $base; } } } if ( $result{ valid } != 1 ) { @result{ qw/ valid base value / } = ('','',''); } else { my $function = $bases{ $result{ base } }{ function }; $result{ value } = $function->( $num ); } return \%result; }

    To be perfectly fair, the scientific notation regex is a bit of a hack and needs to be fixed.

    The code works by checking whether the argument to the sub is valid for one of the regexes specified in the %bases hash. If it's valid for more than one type, then the argument is considered ambiguous.


    Update: Looking at MrNobo1024's code and it seems like I overcoded the heck out of this.

Re: Extract numbers in multiple bases
by jbert (Priest) on Oct 15, 2001 at 17:25 UTC
    By changing your rules a bit (since you aren't 100% on all of them are they under your control?) can't you just use string eval to let the perl interpreter apply its own rules?
    my $str = "123e4"; print asNum( $str ); sub asNum { my $str = shift; my $foo; eval "\$foo = $str"; # backslash for string not reference if( $@ ) { warn( "oops : $@" ); } return $str; }
    Ugly, but probably 'rock-solid', no esoteric modules and should work with any perl.

      That should be return $foo - this won't deal with several of the specified cases like h and t suffixes but it is clever nonetheless :-)




Re: Extract numbers in multiple bases
by mischief (Hermit) on Oct 15, 2001 at 20:12 UTC

