Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^3: Alternative to bytes::length()

by BrowserUk (Patriarch)
on Dec 23, 2009 at 06:16 UTC ( [id://814058]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Alternative to bytes::length()
in thread Alternative to bytes::length()

Seems like ne "" is what I was looking for. :)

Be careful. Anonymonk's benchmark is conflating an aweful lot of other stuff in with the actual code you are concerned about.

I believe (but I'm open to correction), this to be a far better benchmark, and it shows a radically different result. It might just set your mind at ease. (Or not!):

#!/usr/bin/perl -- use strict; use warnings; use Benchmark qw( cmpthese ); # Make bytes:: functions available, but use character semantics. use bytes (); our $smileys = "\x{263a}" x 10_000; our $empty = "\x{263a}"; chop $empty; cmpthese -1, { bytes => q{ my $c=0; ( bytes::length($empty) or bytes::length($smileys) ) and ++$c +for 1 .. 1000; }, utf8 => q{ my $c=0; ( length($empty) or length($smileys) ) and ++$c for 1 .. 1000; }, ord => q{ my $c=0; ( ord( $empty ) or ord( $smileys ) ) and ++$c for 1 .. 1000; }, 'ne""' => q{ my $c=0; ( $empty ne '' or $smileys ne '' ) and ++$c for 1 .. 1000; }, }; __END__ C:\test>junk8 Rate bytes ord ne"" utf8 bytes 1379/s -- -72% -75% -76% ord 4992/s 262% -- -10% -13% ne"" 5566/s 304% 12% -- -3% utf8 5757/s 317% 15% 3% --

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^4: Alternative to bytes::length() (7% solution)
by tye (Sage) on Dec 23, 2009 at 11:39 UTC

    I believe that your 'utf8' case is mostly benchmarking the pulling out of the character count cached in the magic, thus completely missing the original problem. Even if it weren't, I don't see how your benchmark provides any justification for not using eq '' (which perhaps you weren't trying to imply).

    - tye        

      I believe that your 'utf8' case is mostly benchmarking the pulling out of the character count cached in the magic, thus completely missing the original problem.

      I was mostly concerned with pointing out that it's important to consider exactly what you're benchmarking.

      I'm also not convinced that there is enough of a description of "the original problem" to say whether I missed or not. I'm finding it quite hard to think of an application where the time taken to obtain the length of a string would be very significant?

      Except maybe when sorting strings by length, at which point the caching would pretty much negate the first-time cost.

      I don't see how your benchmark provides any justification for not using eq ''

      Um...cos I modified the original benchmark and didn't think about it. Having just swapped the 'ne's for 'eq's, it does make a suprising difference. Though I haven't thought through whether that's down the efficiency of the operator or the change to the boolean logic.

      It did surprise me greatly that ord worked out much faster than bytes::length.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        A parser which consumes the string would be an example application.

        I agree that in many cases the caching is going to hide or eliminate the problem, and in fact, it was a little tricky to compose the original example.

        However, in code that doesn't need to know the actual length of the string in characters, it's poor practice to use length(), whose cost scales with the size of the string on SVf_UTF8 scalars. The advantage of bytes::length() is that it's basically a lookup on a member of the SV struct. There's a little bit of extra math to deal with offsets, but it's still O(1) rather than O(n).

        The ne "" idiom is also O(1) because it's just checking to see whether the total byte size of the string is 0, not counting characters and seeing if the count is 0. Which makes it an effective replacement.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://814058]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (9)
As of 2024-04-19 09:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found