Re^2: Alternative to bytes::length()

Replies are listed 'Best First'.
Re^3: Alternative to bytes::length() by ikegami (Patriarch) on Dec 23, 2009 at 02:02 UTC
Do you have an example of this magic? That would be an argument for creating a new function, not for keeping bytes.	[reply]
Re^4: Alternative to bytes::length() by ikegami (Patriarch) on Dec 23, 2009 at 02:15 UTC
Actually, simply calling `length` on a scalar with `UTF8=1` adds the magic. `>perl -MDevel::Peek -e"Dump $_=chr(0x2660)x100; length $_; Dump $_" SV = PV(0x2379ec) at 0x1845eec REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x184c5ec "..."\0 [UTF8 "..."] CUR = 300 LEN = 304 SV = PVMG(0x18242cc) at 0x1845eec REFCNT = 1 FLAGS = (SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x184c5ec "..."\0 [UTF8 "..."] CUR = 300 LEN = 304 MAGIC = 0x1824e64 MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 100 <---------- char length` [download] It's a pity that actions such as `chop`, appending a `UTF8=0` string, etc void the count instead of updating it. Note that `$_ eq ''` doesn't add the magic, so not only is it faster, it uses less memory.	[reply] [d/l] [select]
Re^4: Alternative to bytes::length() by creamygoodness (Curate) on Dec 23, 2009 at 02:25 UTC
Looks like you found the caching mechanism. From perlguts: `w PERL_MAGIC_utf8 vtbl_utf8 UTF-8 length+offset cac +he` [download] As for keeping `bytes`... meh, my attachment to the `bytes` pragma extended only to that use case, as the efficiency of `CORE::length()` with `SVf_UTF8` scalars is a bummer. I'm not even going to bother posting to p5p now that my concern has been addressed another way.	[reply] [d/l] [select]
Re^3: Alternative to bytes::length() by assemble (Friar) on Dec 23, 2009 at 14:15 UTC
Why would they eliminate the bytes pragma? What about those of us who aren't always manipulating character data and actually do care about the bytes themselves?	[reply]
Re^4: Alternative to bytes::length() by ikegami (Patriarch) on Dec 23, 2009 at 15:02 UTC
Strings can contain bytes. You don't have to do anything special to work with bytes. `use bytes;` has nothing to do with manipulating bytes. If you need to manipulate the internal string format to optimize or to work with some buggy XS, You want `utf8::upgrade` or `utf8::downgrade`. If you need you need to encode to UTF-8 or decode from UTF-8, You want `utf8::encode`, `Encode::encode`, `utf8::decode` or `Encode::decode`. The person probably wants to eliminate it because of that very misconception you expressed. But don't worry, if anything is ever done, it would still be available on CPAN.	[reply] [d/l] [select]
Re^5: Alternative to bytes::length() by assemble (Friar) on Dec 23, 2009 at 15:47 UTC
I'm talking more about situations where I'm manipulating binary data, and I don't want Perl even looking at the data and trying to guess what it is. Instead of going through every single record in the file and unpacking the whole thing, it is often more efficient to use substr to get the few bytes i actually care about, and work with those. It would be similar to working with an actual character array in C. Reading through bytes gives me the impression that Perl will try to figure out what kind of string I've got based on what's in it & where it came from unless I tell it otherwise.	[reply]
Re^6: Alternative to bytes::length() by ikegami (Patriarch) on Dec 23, 2009 at 16:04 UTC
Re^7: Alternative to bytes::length() by WizardOfUz (Friar) on Dec 23, 2009 at 16:34 UTC
Some notes below your chosen depth have not been shown here


The stupid question is the question not asked
	PerlMonks