A few points, if I may:
- On modern linux configured with transparent hugepages, you (likely) get hugepages for big segments without any special code or handling.
- Your 1st conservative unrolling is more accurately called loop peeling. The second form is complete loop peeling. To unroll a loop you merge N*loop bodies and iterate for (i=0; i<len; i += N) { BODY(i); BODY(i+1); ... }
- I expect very little benefits from unrolling the complex inner loop. (But certainly I'd test a 2x unroll. Sometimes weird effects may be at work.)
- Random memory access with hugepages is still on the order of 200 clocks or so. This time would be well spent on calculations. I'd therefore start with a few iterations of type a (=calculated), then switch over to type b (=lookup) once the prefetch has completed. Either loop might be unrolled.
- Better still: prefetch the data needed for next loop iteration, that is vecM[m7next]. It will then be ready at the right time. (*)
- Unrolling benefits may not carry over when hyperthreading is accounted for.
- The hashing step+modulus involves a chain of 3 muls, meaning latency > 3*4+some. Loop iterations may overlap, except quarter of the tests vs 500+-127 will mispredict. This suggests another option of calculating a temporary q8_to_d9[128] in a tight loop (that allows good overlap), followed by a loop with the tests.
- The q8_to_d9[] is a good candidate for vectorization. So is q9_to_c9[]. Inbetween, grep from d9 to q9. I think vectorized code will beat the lookup variant. Or not. Lookup+vectorized q9_to_c9 might be better. (*)
- Test with different compilers. gcc, clang.
All in all, this problem seems like a nice little assignment for a competent cryptanalyst.
Modulo cmps. Since (2**31)%1000003 == 477207, we know we may safely add small values to the hashval without fear of sign change or overflow. This allows one to rewrite tests such as
v %= 1001; if (v < 0) v += 1001; if (v != 100) continue;
in a simpler, speedier form:
v += 1001 - 100; if (v % 1001) continue;
(*) Updated.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|