Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
But neither regexps nor unpack feels like the fastest way to do it. Is there any other way?

Without benchmarking, I don't know. But one thing you might want to try is to use the index function to determine how many characters to grab in your substr. This may save on the cost of the regexp engine, which might just be doing a Boyer-Moore search anyway.

Here are a couple of snippets to try. If you can guarentee that there is always a null character in your four bytes, the problem is pretty trivial: just look for the next one starting at the current index point in the string:

$value = substr($string, $index, index($string, "\x00", $index) - $index)

Which literally means "take the substring of the string starting at the index point for n characters, where n characters is the difference between the index and the next NUL.

If there are values like 9876 (that is, no NULs), then this simple-minded approach won't work, since you may run arbitrarily far down the string before encountering the next NUL, or there may be no more at all, in which case index will return -1, and things will really go askew.

In this case, you'll have to save the result in a temporary length variable, and clamp it to 4 if it's not in range:

my $len = index($string, "\x00", $index) - $index; $len = 4 if $len < 0 or $len > 4; $value = substr($string, $index, $len)

That's probably about the best you can do (in terms of other alternatives). If this approach is worse, or only marginally better, the only remaining card to play would be to code it in C and use Inline::C.

A bit later: regarding unpack, in my experience, when I'm really strapped for speed it rarely makes the grade with respect to other alternatives. I think the meme that says "unpack is the fastest" needs to be taken with a grain of salt, and my hunch is that it takes a certain amount of time to decode the format string argument. A single substr with offsets determined by index is usually faster (but at the cost of more make-work code). YMM undoubtedly V.

• another intruder with the mooring in the heart of the Perl

In reply to Re: Null-stripping performance by grinder
in thread Null-stripping performance by qiau

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2021-05-14 14:25 GMT
Find Nodes?
    Voting Booth?
    Perl 7 will be out ...

    Results (150 votes). Check out past polls.