Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
XP is just a number
 
PerlMonks  

Re^2: Alternative to bytes::length()

by creamygoodness (Curate)
on Dec 23, 2009 at 01:38 UTC ( #814038=note: print w/ replies, xml ) Need Help??


in reply to Re: Alternative to bytes::length()
in thread Alternative to bytes::length()

Or maybe the character length should be stored in string variables?

I think the length in characters might be cached using MAGIC -- I know some UTF-8 stuff is.

What's the problem with the current solution?

There was just a post to p5p from someone who wanted to terminate the bytes pragma with extreme prejudice. I wanted to mention this use case.


Comment on Re^2: Alternative to bytes::length()
Re^3: Alternative to bytes::length()
by ikegami (Pope) on Dec 23, 2009 at 02:02 UTC

    Do you have an example of this magic?

    That would be an argument for creating a new function, not for keeping bytes.

      Actually, simply calling length on a scalar with UTF8=1 adds the magic.
      >perl -MDevel::Peek -e"Dump $_=chr(0x2660)x100; length $_; Dump $_" SV = PV(0x2379ec) at 0x1845eec REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x184c5ec "..."\0 [UTF8 "..."] CUR = 300 LEN = 304 SV = PVMG(0x18242cc) at 0x1845eec REFCNT = 1 FLAGS = (SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x184c5ec "..."\0 [UTF8 "..."] CUR = 300 LEN = 304 MAGIC = 0x1824e64 MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 100 <---------- char length

      It's a pity that actions such as chop, appending a UTF8=0 string, etc void the count instead of updating it.

      Note that $_ eq '' doesn't add the magic, so not only is it faster, it uses less memory.

      Looks like you found the caching mechanism. From perlguts:
      w PERL_MAGIC_utf8 vtbl_utf8 UTF-8 length+offset cac +he

      As for keeping bytes... meh, my attachment to the bytes pragma extended only to that use case, as the efficiency of CORE::length() with SVf_UTF8 scalars is a bummer. I'm not even going to bother posting to p5p now that my concern has been addressed another way.

Re^3: Alternative to bytes::length()
by assemble (Friar) on Dec 23, 2009 at 14:15 UTC
    Why would they eliminate the bytes pragma? What about those of us who aren't always manipulating character data and actually do care about the bytes themselves?

      Strings can contain bytes. You don't have to do anything special to work with bytes. use bytes; has nothing to do with manipulating bytes.

      If you need to manipulate the internal string format to optimize or to work with some buggy XS,
      You want utf8::upgrade or utf8::downgrade.
      If you need you need to encode to UTF-8 or decode from UTF-8,
      You want utf8::encode, Encode::encode, utf8::decode or Encode::decode.

      The person probably wants to eliminate it because of that very misconception you expressed. But don't worry, if anything is ever done, it would still be available on CPAN.

        I'm talking more about situations where I'm manipulating binary data, and I don't want Perl even looking at the data and trying to guess what it is.

        Instead of going through every single record in the file and unpacking the whole thing, it is often more efficient to use substr to get the few bytes i actually care about, and work with those. It would be similar to working with an actual character array in C.

        Reading through bytes gives me the impression that Perl will try to figure out what kind of string I've got based on what's in it & where it came from unless I tell it otherwise.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://814038]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (15)
As of 2014-04-16 20:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (434 votes), past polls