http://www.perlmonks.org?node_id=602181


in reply to Null-stripping performance

But neither regexps nor unpack feels like the fastest way to do it. Is there any other way?

Without benchmarking, I don't know. But one thing you might want to try is to use the index function to determine how many characters to grab in your substr. This may save on the cost of the regexp engine, which might just be doing a Boyer-Moore search anyway.

Here are a couple of snippets to try. If you can guarentee that there is always a null character in your four bytes, the problem is pretty trivial: just look for the next one starting at the current index point in the string:

$value = substr($string, $index, index($string, "\x00", $index) - $index)

Which literally means "take the substring of the string starting at the index point for n characters, where n characters is the difference between the index and the next NUL.

If there are values like 9876 (that is, no NULs), then this simple-minded approach won't work, since you may run arbitrarily far down the string before encountering the next NUL, or there may be no more at all, in which case index will return -1, and things will really go askew.

In this case, you'll have to save the result in a temporary length variable, and clamp it to 4 if it's not in range:

my $len = index($string, "\x00", $index) - $index; $len = 4 if $len < 0 or $len > 4; $value = substr($string, $index, $len)

That's probably about the best you can do (in terms of other alternatives). If this approach is worse, or only marginally better, the only remaining card to play would be to code it in C and use Inline::C.

A bit later: regarding unpack, in my experience, when I'm really strapped for speed it rarely makes the grade with respect to other alternatives. I think the meme that says "unpack is the fastest" needs to be taken with a grain of salt, and my hunch is that it takes a certain amount of time to decode the format string argument. A single substr with offsets determined by index is usually faster (but at the cost of more make-work code). YMM undoubtedly V.

• another intruder with the mooring in the heart of the Perl

Replies are listed 'Best First'.
Re^2: Null-stripping performance
by qiau (Initiate) on Feb 26, 2007 at 18:09 UTC
    This looks interesting, I have to take a look at the data tomorrow when I get to work but I do like the ideas here.

    A few benchmarks too, thismight acually be something!

    Thanks a lot for the help!