Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Weird behavior of int()

by cLive ;-) (Prior)
on May 20, 2024 at 20:49 UTC ( [id://11159569]=perlquestion: print w/replies, xml ) Need Help??

cLive ;-) has asked for the wisdom of the Perl Monks concerning the following question:

(posted this in the FB Perl group, but I know a lot of you aren't on Fb, so...

Been writing perl since '95. Only found out today that this idiom for coercing a string into an int can fail:

my $x = int($str+0);

It fails when the string begins with nan or inf (case insensitive). So, running this:

my @test = qw(in inf info information INF INFO Na Nan Nanny NAN NANNY +nan04); foreach my $x (@test) { my $y = int($x); printf "%11s -> %s\n", $x, $y; }

outputs:

in -> 0 inf -> Inf info -> Inf information -> Inf INF -> Inf INFO -> Inf Na -> 0 Nan -> NaN Nanny -> NaN NAN -> NaN NANNY -> NaN nan04 -> NaN

It feels like a bug that the regex matches on /^(?:nan|inf).*/s (or \w* - I haven't fully tested it. Rather than matching on /^(?:nan|inf)$/s

Just wondered if this actually is a bug. I've always used Math::BigFloat whenever I need accurate math, so I've never used NaN/Inf in code before.

what do you all think?

Replies are listed 'Best First'.
Re: Weird behavior of numification?
by hv (Prior) on May 20, 2024 at 21:44 UTC

    Your subject line is mildly misleading: this is behaviour of Perl's string-to-number conversion ("numification") independent of how it is triggered (which is by ... + 0 in your first code fragment, but by int(...) in the second one).

    It is long-standing behaviour when Perl converts a string to a number for it to parse as much of the string as possible as a number, but then give a warning if there is any unparsable garbage beyond that:

    % perl -wle 'print 0 + $_ for qw{ 123 123foo 12e3 12e3f4 inf inflight +}' 123 Argument "123foo" isn't numeric in addition (+) at -e line 1. 123 12000 Argument "12e3f4" isn't numeric in addition (+) at -e line 1. 12000 Inf Argument "inflight" isn't numeric in addition (+) at -e line 1. Inf %

    .. so this all seems quite consistent to me. (I guess you ran your test without warnings enabled, which is always a risky thing to do - warnings can help shed light, even if not Inf light. :)

    The special values are handled in Perl's source code by Perl_grok_infnan; reading through the comments there will give you a clue about the many platform-specific variants that Perl also supports.

Re: Weird behavior of int()
by Fletch (Bishop) on May 21, 2024 at 03:01 UTC

    I want to say that the behavior is similar (and historical) to what atoi/atol do in that they read and convert as much of the string as possible. See for example atoi(3) which mentions the behavior is due to the C standard and should prossibly be replaced with strtol or the like.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Weird behavior of int()
by GrandFather (Saint) on May 20, 2024 at 21:27 UTC

    What is the practical issue here? If you are expecting to coerce 'Nanny' to something sensible as a number you have already missed the boat. It's not clear that there are any regular expressions involved in parsing strings into numbers.

    As an aside, your regular expressions would be better written as /(?:nan|inf)/ and /\w(?:nan|inf)\w/ rather than trying to match an entire string.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
      I fully accept the part about "info"+0 = Inf, but the part I'm not liking with these examples is that the output of int is still a floating point value, without a warning.
      use warnings; say int(1e400); # "Inf\n"
      It seems like the contract of int() should always return an integer, or maybe undef. Inf and NaN ought to give warnings at least.

        Unfortunately that just isn't the way perl defines things. Perl has no distinction between float/int - not even when int() function is used. Perl scalars essentially have 3 basic flavours: number/string/reference.

        Inf and NaN are valid numbers in perl, and int() returns a number. Hence why they don't see anything wrong with int() returning Inf/Nan.

        The perl doc for int() not even mentioning inf/nan is another example of how un-user friendly perl is, and IMO is one of the reasons why perl is dieing out.

      I echo this confusion ... in particular, what confuses me the most is: what is the point of the +0 ?

      In what situation do you (intentionally) want/get a different result from int($str+0) vs int($str) ???

      As an aside, your regular expressions would be better written as ...

      I don't believe the OP is saying they intend to replace the int(...) with a regex, I believe they are assuming int(...) is implemented via a regex?

      The key reason why int("information") returns Inf is because of the intended usage of the function, per the docs...

      int EXPR int Returns the integer portion of EXPR. If EXPR is omitted, u +ses $_. ...

      ...the key word being "portion". The If you give it a scalar, it's going to consume as much of that scalar as it can to produce an integer:

      $ perl -le 'print int("32hike")' 32
        what is the point of the +0 ?

        That makes no sense for this usage because int() will do all that is needed. The +0 trick can be used to eliminate leading zeroes, i.e. 006 -> 6. I sometimes translate between formats with and without leading zeroes.

      There's no practical issue in normal code, but it behaves in a way that feels unintuitive. I have never used NaN or Inf in code, and never would. I would be using Math::BigFloat for anything that odd.

      I guess the part that I don't like is that NaN and Inf are valid return values, more than anything. Neither are integers - though I also realize we can't change how int works, it feels very unintuitive.

      The +0 was a quick hack I would use to coerce a string into a number - in lieu of $num =~ /^\d+$/ or $num=0 for untainting - I didn't know Inf and Nan were valid integers in Perl until last week (one of those things I thought I knew but had just never encountered - this is up there with when I first head about using _ in code :D

      I can live with NaN+0=NaN<code> and <code>Inf+0=Inf, but I really do think int('information') should be zero rather than Inf!

        I can live with NaN+0=NaN and Inf+0=Inf..

        It's just a matter of understanding the process.
        If you can accept that '42.31' + 0 is 42.31 and '42.31ormation' + 0 is 42.31 and 'inf' + 0 is Inf, then it makes perfect sense that 'information' + 0 is Inf.
        Then, once you understand that the int() function merely chops off the fractional part of its argument, there's not much left to puzzle over, AFAICS.

        This process that perl is using to numify strings is the same as that used by the strtod() function in the C programming language.
        I think it's a fairly standard process across many programming languages.
        It's not about to change any time soon.

        Cheers,
        Rob
yeah, facebook...
by NERDVANA (Curate) on May 21, 2024 at 02:43 UTC
    Offtopic, but I can't really understand why anyone wants to have a group forum on Facebook. I have an account, but the user interface is so laggy, claustrophobic, and clogged with ads it makes my skin crawl. Reddit is a fairly nice forum, but everyone there likes to downvote in disagreement, rather than supporting healthy debate and reserving downvotes for trolls. The moderators of r/perl are also rather picky. All the perl discussions should move to here!

      Simple answer is that the FB group is more active than here (though I am encouraged by the number of replies!). 25 years ago this place was buzzing - not so much now.

        Which facebook group are you referring to?

Re: Weird behavior of int()
by NERDVANA (Curate) on May 21, 2024 at 21:04 UTC
    Found a new idiom :-)
    my $y= do { use integer; $x+0 };

    I agree it would be nice if int() did that already.

      No, it wouldn't.

      It silently converts some values to -1. This is only useful if you want to check if the number is representable by an IV.

      It silently converts some positive values to negative values. This is only useful if you want to cast from unsigned to signed.

      In both cases, it would be better if you used an approach that made it clear what you wanted to do instead.

        Hm, good catch. Even more undesirable edge cases than the original.
Re: Weird behavior of int()
by dissident (Beadle) on May 20, 2024 at 21:42 UTC
    Personally I would consider the integer part of a string beginning with letters should be undefined.
    Perldoc int doesn't specify anything for this case.

    So I would consider the results found by the OP as surprise, but definitely not as a bug.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11159569]
Approved by marto
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-06-20 06:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.