Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Identifying and parsing numbers

by dragonchild (Archbishop)
on Jan 15, 2003 at 22:47 UTC ( #227260=perlquestion: print w/replies, xml ) Need Help??

dragonchild has asked for the wisdom of the Perl Monks concerning the following question:

I need to be able to identify values like '9i' or '1.4cm'. Now, I started with a simple regex of /^\s*(\d+)\s*([a-z]{1,2})\s*$/i and that was fine. Except it doesn't do decimal places.

So, I went to /^\s*([+-]?(?:\d+)?(?:\.)?(?:\d+)?)\s*([a-z][a-z]?)\s*$/i, but I know that this will give me false positives. (For example, '.c' will work match just fine.)

Thus, I put it out to the monastery.

(Oh, to make it more fun, I have to use 5.005_003. I can't use anything cool like :alpha.)

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

update (broquaint): added missing </code> tag

Replies are listed 'Best First'.
Re: Identifying and parsing numbers
by Abigail-II (Bishop) on Jan 15, 2003 at 23:16 UTC
    use Regexp::Common; /^\s*$RE{num}{real}\s*[a-z]{1,2}\s*$/;

    Abigail

Re: Identifying and parsing numbers
by mojotoad (Monsignor) on Jan 15, 2003 at 23:09 UTC
Re: Identifying and parsing numbers
by bart (Canon) on Jan 16, 2003 at 03:19 UTC
    You know you need at least one digit. So you can use lookahead, to test for it:
    /^([+-]?(?=\d|\.\d)\d*\.?\d*)([a-z]+)$/
    Or you can make a regex out of alternatives:
    /^([+-]?(?:\d+\.?\d*|\.\d+))([a-z]+)$/
Re: Identifying and parsing numbers
by Gilimanjaro (Hermit) on Jan 16, 2003 at 12:46 UTC
    You want to match:
    * optional plus/minus
    * numeric value which can be either
       * digits optionally followed by a decimal point and more digits
       * a decimal point followed by digits
    * one or two characters

    I think the following regex should meet these conditions:
    /(\-?(?:\d+(?:\.\d+)?)|(?:\.\d+))([a-z]{1,2})/g

    The issue that makes this ones non-trivial is that the decimal point and digits are either a required element or an optional element depending on the presence of a whole number.

    My regex should return a list containing value/unit pairs... If you want to exactly match and retreive you should use:
    /^([+\-]?(?:\d+(?:\.\d+)?)|(?:\.\d+))([a-z]{1,2})$/

    That is either a two-element list with the value and the unit, or an empty list.

Re: Identifying and parsing numbers
by tekkie (Beadle) on Jan 16, 2003 at 17:39 UTC
    The regexp I used was:

    /^([+\-]?\d*(?:\.\d+)?)([a-z]{1,2})$/

    Which is basically just a condensed version of Gilimanjaro's version, except instead of using:

    ([+\-]?(?:\d+(?:\.\d+)?)|(?:\.\d+))

    to match the first portion of the input, I converted it to this:

    ([+\-]?\d*(?:\.\d+)?)

    And my breakdown:

    [+\-]?
    We want to match a single positive or negative symbol either once or not at all

    \d*
    This extracts the whole number portion of the input, if the whole number portion is non-existant, that's account for by using * instead of + (zero-plus instead of one-plus)

    (?:\.\d+)?
    This extracts the decimal portion of the input. The (?:)? grouping allows the regexp to match a decimal followed by digits either once or not at all.

    This regexp allows both the whole number portion and the decimal portion to be optional components, but prevents the need for alternative pattern matching.
      I think this regex will also give false positives, because it would now also match something like '+cm'.

      There is no way around the '|' I think, because the decimal part is what you might call optionally optional; it is only optional if there is a whole part of the number. Because of this, you can't always use the ? modifier to make the decimal part optional.

      The only other ways I could think of would use experimental regex features. Even zero-width look-behind or look-ahead assertions can't be aplied here I think...

      But I've finally been referenced by name on Perlmonks! Made my day!

      :)

        Though I was dumb enough to actually post will not being logged in... There go my votes... ;)
        Ah, yes, you are absolutely correct, I overlooked the fact that doing:

        (\d*(?:\.\d+)?)

        Would allow both the whole number and the decimal portions to match 'nothing at all' as I so aptly described in my breakdown.

        One more case of testing too many things that should work and not enough things that shouldn't.
Re: Identifying and parsing numbers (KISS)
by tye (Sage) on Jan 16, 2003 at 23:10 UTC

    Long ago I came up with Extract numbers as a very simple method for pulling numbers out of text that works in lots of cases.

    This thread made me realize that I could make it even simpler while making it more effective. I've updated it to reflect this.

    The code now reads: my @numbers= $text =~ /([-+\d.eE]+\d)/g; so you could apply this principle to your situation with my( $number, $units )= $text =~ /([-+\d.eE]+\d)(.*)/; It isn't a perfect solution but I think it is worth considering just for its extreme simplicity.

                    - tye

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://227260]
Approved by mojotoad
Front-paged by JaWi
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2022-12-05 09:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?