Prefetching is tricky. I wouldn't try so many of them together. The number of outstanding requests is limited. How about this:
// candidates for vectorization, lets break them apart
static inline void q7_to_m7(int m7[], int m6)
{
int q, m6x = m6 ^ 1;
for (q = 0; q < 128; q += 2) { // unroll by 2
m7[q] = (m6 ^ q) * H_PRIME;
m7[q+1] = (m6x ^ q) * H_PRIME;
}
}
static inline void prefetch_m(unsigned int i)
{
_mm_prefetch(&bytevecM[i], _MM_HINT_T0);
_mm_prefetch(&bytevecM[i^64], _MM_HINT_T0);
}
...
prefetch_m((m6^1) * H_PRIME);
int m7arr[130];
q7_to_m7(m7arr, m6);
// fixup for prefetching two iterations ahead
m7arr[129] = m7arr[128] = m7arr[127];
m7arr[13] = m7arr[15]; m7arr[10] = m7arr[12];
prefetch_m(m7arr[2]);
for (q7 = 1; q7 < 128; ++q7) {
if (q7 == 10 || q7 == 13) continue;
prefetch_m(m7arr[q7+2]);
m7 = m7arr[q7];
...
}
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|