Re^3: Alternative to bytes::length()

Seems like ne "" is what I was looking for. :)

Be careful. Anonymonk's benchmark is conflating an aweful lot of other stuff in with the actual code you are concerned about.

I believe (but I'm open to correction), this to be a far better benchmark, and it shows a radically different result. It might just set your mind at ease. (Or not!):

#!/usr/bin/perl --
use strict;
use warnings;
use Benchmark qw( cmpthese );

# Make bytes:: functions available, but use character semantics.
use bytes ();

our $smileys = "\x{263a}" x 10_000;
our $empty = "\x{263a}"; chop $empty;

cmpthese -1, {   
    bytes => q{
        my $c=0;
        ( bytes::length($empty) or bytes::length($smileys) ) and ++$c 
+for 1 .. 1000;
    },
    utf8 => q{
        my $c=0;
        ( length($empty) or length($smileys) ) and ++$c for 1 .. 1000;
    },
    ord => q{
        my $c=0;
        ( ord( $empty ) or ord( $smileys ) ) and ++$c for 1 .. 1000;
    },
    'ne""' => q{
        my $c=0;
        ( $empty ne '' or $smileys ne '' ) and ++$c for 1 .. 1000;
    },
};

__END__
C:\test>junk8
        Rate bytes   ord  ne""  utf8
bytes 1379/s    --  -72%  -75%  -76%
ord   4992/s  262%    --  -10%  -13%
ne""  5566/s  304%   12%    --   -3%
utf8  5757/s  317%   15%    3%    --
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"I'd rather go naked than blow up my ass"

Comment on Re^3: Alternative to bytes::length() Download Code

Replies are listed 'Best First'.
Re^4: Alternative to bytes::length() (7% solution) by tye (Sage) on Dec 23, 2009 at 11:39 UTC
I believe that your 'utf8' case is mostly benchmarking the pulling out of the character count cached in the magic, thus completely missing the original problem. Even if it weren't, I don't see how your benchmark provides any justification for not using `eq ''` (which perhaps you weren't trying to imply). - tye	[reply] [d/l]
Re^5: Alternative to bytes::length() (7% solution) by BrowserUk (Patriarch) on Dec 23, 2009 at 13:40 UTC
I believe that your 'utf8' case is mostly benchmarking the pulling out of the character count cached in the magic, thus completely missing the original problem. I was mostly concerned with pointing out that it's important to consider exactly what you're benchmarking. I'm also not convinced that there is enough of a description of "the original problem" to say whether I missed or not. I'm finding it quite hard to think of an application where the time taken to obtain the length of a string would be very significant? Except maybe when sorting strings by length, at which point the caching would pretty much negate the first-time cost. I don't see how your benchmark provides any justification for not using eq '' Um...cos I modified the original benchmark and didn't think about it. Having just swapped the 'ne's for 'eq's, it does make a suprising difference. Though I haven't thought through whether that's down the efficiency of the operator or the change to the boolean logic. It did surprise me greatly that ord worked out much faster than bytes::length. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "I'd rather go naked than blow up my ass"	[reply]
Re^6: Alternative to bytes::length() (7% solution) by creamygoodness (Curate) on Dec 23, 2009 at 16:09 UTC
A parser which consumes the string would be an example application. I agree that in many cases the caching is going to hide or eliminate the problem, and in fact, it was a little tricky to compose the original example. However, in code that doesn't need to know the actual length of the string in characters, it's poor practice to use `length()`, whose cost scales with the size of the string on `SVf_UTF8` scalars. The advantage of `bytes::length()` is that it's basically a lookup on a member of the SV struct. There's a little bit of extra math to deal with offsets, but it's still O(1) rather than O(n). The `ne ""` idiom is also O(1) because it's just checking to see whether the total byte size of the string is 0, not counting characters and seeing if the count is 0. Which makes it an effective replacement.	[reply] [d/l] [select]
Re^7: Alternative to bytes::length() (7% solution) by BrowserUk (Patriarch) on Dec 23, 2009 at 16:58 UTC


Syntactic Confectionery Delight
	PerlMonks