http://www.perlmonks.org?node_id=1037291


in reply to Re^5: Undefined vs empty string
in thread Undefined vs empty string

Hm. You seem to be implying that the speed of length varies with the length of the string?

It doesn't vary for strings in the UTF8=0 format, but it does vary for strings in the UTF8=1 format. The length is cached (in a magic annotation) once discovered, though.

>perl -MDevel::Peek -e"utf8::upgrade( $x = "abc" ); Dump($x); length($ +x); Dump($x);" SV = PV(0x7b8d54) at 0x328554 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x7b9fac "abc"\0 [UTF8 "abc"] CUR = 3 LEN = 12 SV = PVMG(0x31e8e4) at 0x328554 REFCNT = 1 FLAGS = (SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x7b9fac "abc"\0 [UTF8 "abc"] CUR = 3 LEN = 12 MAGIC = 0x31f17c MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 3

Replies are listed 'Best First'.
Re^7: Undefined vs empty string
by BrowserUk (Patriarch) on Jun 05, 2013 at 20:33 UTC

    How very strange.

    The upgrading to utf has required the inspection of the bytes and conversion where necessary:

    C:\test\perl -MDevel::Peek -e"utf8::upgrade( $x = qq[abc\xee] ); Dump( +$x);" SV = PV(0xea240) at 0x275898 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0xef178 "abc\303\256"\0 [UTF8 "abc\x{ee}"] CUR = 5 LEN = 6

    The original 4-bytes has been converted to (CUR=) 5, which implies (to me at least) that it could have recorded the charwise length at that point rather than having to rediscover it later.

    (Also, what shell are you using that allows double quotes embedded within double quotes unescaped?)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.