This version looks to give the optimiser even more hints anchored `XX1' at 2 (checking anchored) anchored(BOL). Now if only we had some test data to try the Sys::Mmap optimisation and demerphq's++ nice inversion of the problem at 424532
nph>perl -Mre=debug -e'/^(CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1/'
Freeing REx: `","'
Compiling REx `^(CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1'
size 62 Got 500 bytes for offset annotations.
first at 2
1: BOL(2)
2: OPEN1(4)
4: BRANCH(7)
5: EXACT <CP>(58)
7: BRANCH(21)
8: EXACT <K>(10)
10: ANYOF[LM](58)
21: BRANCH(24)
22: EXACT <ME>(58)
24: BRANCH(38)
25: EXACT <P>(27)
27: ANYOF[AM](58)
38: BRANCH(52)
39: EXACT <S>(41)
41: ANYOF[LZ](58)
52: BRANCH(55)
53: EXACT <WX>(58)
55: BRANCH(58)
56: EXACT <YZ>(58)
58: CLOSE1(60)
60: EXACT <XX1>(62)
62: END(0)
anchored `XX1' at 2 (checking anchored) anchored(BOL) minlen 5
Offsets: [62]
1[1] 2[1] 0[0] 2[1] 3[2] 0[0] 5[1] 6[1] 0[0] 7[4] 0[0] 0[0] 0[
+0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 11[1] 12[2] 0[0] 14[1] 15[1] 0[
+0] 16[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 20[1] 21[1
+] 0[0] 22[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 26[1]
+27[2] 0[0] 29[1] 30[2] 0[0] 32[1] 0[0] 33[3] 0[0] 36[0]
Freeing REx: `"^(CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1"'
update
as the OP says using egrep is a valid solution he must not need to capture any parts of the match so lets make this non-capturing and revist the debug, this time with added interactivity.
nph>perl -Mre=debug -ne'/^(?:CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1/;print"
+>"'
Freeing REx: `","'
Compiling REx `^(?:CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1'
size 59 Got 476 bytes for offset annotations.
first at 2
1: BOL(2)
2: BRANCH(5)
3: EXACT <CP>(57)
5: BRANCH(19)
6: EXACT <K>(8)
8: ANYOF[LM](57)
19: BRANCH(22)
20: EXACT <ME>(57)
22: BRANCH(36)
23: EXACT <P>(25)
25: ANYOF[AM](57)
36: BRANCH(50)
37: EXACT <S>(39)
39: ANYOF[LZ](57)
50: BRANCH(53)
51: EXACT <WX>(57)
53: BRANCH(56)
54: EXACT <YZ>(57)
56: TAIL(57)
57: EXACT <XX1>(59)
59: END(0)
anchored `XX1' at 2 (checking anchored) anchored(BOL) minlen 5
Offsets: [59]
1[1] 4[1] 5[2] 0[0] 7[1] 8[1] 0[0] 9[4] 0[0] 0[0] 0[0] 0[0] 0[
+0] 0[0] 0[0] 0[0] 0[0] 0[0] 13[1] 14[2] 0[0] 16[1] 17[1] 0[0] 18[4] 0
+[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 22[1] 23[1] 0[0] 24[
+4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 28[1] 29[2] 0[0]
+ 31[1] 32[2] 0[0] 33[0] 35[3] 0[0] 38[0]
>SLXC1
Guessing start of match, REx `^(?:CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1' a
+gainst `SLXC1
'...
String not equal...
Match rejected by optimizer
>
If you knew the frequency of each of the pre-fixes you would optimise by putting the most common first, or perhaps if there were a vast number of lines with something you certainly did not want (e.g. RWXX1) then you may even add a [^R] in there right at the start.
Cheers,
R.
Pereant, qui ante nos nostra dixerunt!
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.