Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Reason for this discrepancy with scalar?

by kikuchiyo (Hermit)
on Mar 13, 2024 at 20:31 UTC ( [id://11158229]=perlquestion: print w/replies, xml ) Need Help??

kikuchiyo has asked for the wisdom of the Perl Monks concerning the following question:

Consider the following program:

perl -MDevel::Peek -le 'my @nonempty = (1); my @empty = (); print Dump +(scalar @nonempty); print Dump(scalar @empty)' SV = IV(0x563dfe2d16f0) at 0x563dfe2d1700 REFCNT = 1 FLAGS = (TEMP,IOK,pIOK) IV = 1 SV = PVNV(0x563dfe2d0200) at 0x563dfe2ce410 REFCNT = 2147483647 FLAGS = (PADTMP,IOK,NOK,POK,READONLY,PROTECT,pIOK,pNOK,pPOK) IV = 0 NV = 0 PV = 0x7fb0f957d3c2 "0" CUR = 1 LEN = 0

Which is to say, scalar returns a PV if the array is empty, and an IV if it is not. This can be a problem, because most serializers, including JSON modules, will encode the PV as "0" and the IV as 1, which will cause a problem if the consumer of the resulting JSON expects strict type conformance.

Is there a logical reason for this discrepancy?

(Yes, I know that I could just do 0+scalar @array - my point is that that shouldn't be necessary.)

Replies are listed 'Best First'.
Re: Reason for this discrepancy with scalar?
by choroba (Cardinal) on Mar 13, 2024 at 21:39 UTC
    If you want to produce JSON with a predefined structure where strings and numbers are distinguished, use Cpanel::JSON::XS::Type.

    In Perl, it doesn't matter.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      I could also imagine using perl objects for this which overload stringification and nummification in a definitive way.

      This has probably been done before.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

Re: Reason for this discrepancy with scalar?
by syphilis (Archbishop) on Mar 14, 2024 at 08:43 UTC
    Is there a logical reason for this discrepancy?

    There's a pretty weird scoping issue going on here:
    D:\>perl -MDevel::Peek -le "my @empty = (); Dump(scalar @empty)" SV = PVNV(0x1a1928ddce0) at 0x1a1928dbb10 REFCNT = 2147483647 FLAGS = (PADTMP,IOK,NOK,POK,READONLY,PROTECT,pIOK,pNOK,pPOK) IV = 0 NV = 0 PV = 0x7ffe685399a1 "0" CUR = 1 LEN = 0 D:\>perl -MDevel::Peek -le "our @empty = (); Dump(scalar @empty)" SV = IV(0x178a86f0650) at 0x178a86f0660 REFCNT = 1 FLAGS = (PADTMP,IOK,pIOK) IV = 0
    Can anyone explain that behaviour ?
    (I'm thinking "bug".)

    Cheers,
    Rob
      This doesn't happen in 5.26.1 (I haven't tested any other old versions), where the our and my behave the same way.

      map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        After an hour of bisecting, I have the commit that changed the behaviour:

        https://github.com/Perl/perl5/commit/7be75ccf16313d987eb5a6e9ff6aec9fea4ef3d4

        optimise @array in boolean context
        It's quicker to return (and to test for) &PL_sv_zero or &PL_sv_yes, than setting a targ to an integer value or, in the vase of padav, creating a mortal sv and setting it to an integer value.
        In fact for padav, even in the scalar but non-boolean case, return &PL_sv_zero if the value is zero rather than creating and setting a mortal.

        Update: Compare to

        perl -MDevel::Peek -e 'Dump(!!0)' SV = PVNV(0x1838140) at 0x18363f8 REFCNT = 2147483647 FLAGS = (IOK,NOK,POK,IsCOW,READONLY,PROTECT,pIOK,pNOK,pPOK) IV = 0 NV = 0 PV = 0x5e6583 "" [BOOL PL_No] CUR = 0 LEN = 0

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      The ops used for looking up an array in scalar context are different for pad vars and for global vars (pp_padav vs pp_rv2av). One of those ops was optimised by me such that, when returning a zero value, it returned the special SV PL_sv_zero rather than setting a PADTMP SV to zero and then returning that. I don't remember offhand why the other op wasn't or couldn't be similarly optimised. Apart from esoteric uses (such as inspecting the internals with Peek()), the two different return values should generally have the same behaviour. Both evaluate to 0 in numeric context and to "0" in string context. Just with different overheads. Serializers tend to struggle with such things, but that's a general problem with perl's polymorphic internal representations of values. For example, for a hypothetical serializer function, you typically get this behaviour:
      my $x = 0; serialize($x); # outputs an integer say "x=$x"; # $x now has both valid int and string representati +ons serialize($x); # outputs a string

      Dave.

        Thanks!

        2 questions tho:

        1. > looking up an array in scalar context

          Yeah, but I was under the impression the patch was supposed to improve Boolean context. And several sources claim that Perl is internally subdividing scalar context into Boolean, string and (various) numeric contexts.

        2. > Serializers tend to struggle with such things

          What some programmers seem to expect is that the initial type (i.e. at time of assignment) is preserved.

          Was it ever discussed to add a flag "initial_type" to scalar vars, which ...

          • is updated with each assignment only
          • could be queried via a Scalar::Util::initial_type() function
          • update serializer to use initial_type()

          ...???

          And if it was already discussed, what are the reasons against?

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

Re: Reason for this discrepancy with scalar?
by LanX (Saint) on Mar 13, 2024 at 21:33 UTC
    > encode the PV as "0"

    No, @empty returns three internal types

    IV = 0 NV = 0 PV = 0x7fb0f957d3c2 "0"

    The background is probably that the length of an array is very often used as Boolean in conditions.

    (Actually I also expected a dualvar flag to be set, but I'm no expert here.)

    Perl tends to pre-optimize type castings. I. e. the stringification is already "memorized" to speed up future conversions.

    Contrary to Python╣ does it depend on context if a var is seen as integer, float or string.

    So yes, adding 0 is the way to go.

    FWIW: does this really happen with JSON that the string is preferred? And what happens if you use the first result as string prior to serialization? A PV "1" should be cashed after the first stringification, too!

    Hence same problem again.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

    ╣) I've seen many type errors in Python, where Perl was just silently DWIM. But this was back in the time I experimented with Python 2.

    ▓) nah, nonsense. Boolean false stringifies as "" not "0".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11158229]
Approved by marto
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-06-19 19:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.