comment on

This version looks to give the optimiser even more hints anchored `XX1' at 2 (checking anchored) anchored(BOL). Now if only we had some test data to try the Sys::Mmap optimisation and demerphq's++ nice inversion of the problem at 424532

nph>perl -Mre=debug -e'/^(CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1/'
Freeing REx: `","'
Compiling REx `^(CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1'
size 62 Got 500 bytes for offset annotations.
first at 2
   1: BOL(2)
   2: OPEN1(4)
   4:   BRANCH(7)
   5:     EXACT <CP>(58)
   7:   BRANCH(21)
   8:     EXACT <K>(10)
  10:     ANYOF[LM](58)
  21:   BRANCH(24)
  22:     EXACT <ME>(58)
  24:   BRANCH(38)
  25:     EXACT <P>(27)
  27:     ANYOF[AM](58)
  38:   BRANCH(52)
  39:     EXACT <S>(41)
  41:     ANYOF[LZ](58)
  52:   BRANCH(55)
  53:     EXACT <WX>(58)
  55:   BRANCH(58)
  56:     EXACT <YZ>(58)
  58: CLOSE1(60)
  60: EXACT <XX1>(62)
  62: END(0)
anchored `XX1' at 2 (checking anchored) anchored(BOL) minlen 5
Offsets: [62]
        1[1] 2[1] 0[0] 2[1] 3[2] 0[0] 5[1] 6[1] 0[0] 7[4] 0[0] 0[0] 0[
+0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 11[1] 12[2] 0[0] 14[1] 15[1] 0[
+0] 16[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 20[1] 21[1
+] 0[0] 22[4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 26[1] 
+27[2] 0[0] 29[1] 30[2] 0[0] 32[1] 0[0] 33[3] 0[0] 36[0]
Freeing REx: `"^(CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1"'
[download]

update

as the OP says using egrep is a valid solution he must not need to capture any parts of the match so lets make this non-capturing and revist the debug, this time with added interactivity.

nph>perl -Mre=debug -ne'/^(?:CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1/;print"
+>"'
Freeing REx: `","'
Compiling REx `^(?:CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1'
size 59 Got 476 bytes for offset annotations.
first at 2
   1: BOL(2)
   2: BRANCH(5)
   3:   EXACT <CP>(57)
   5: BRANCH(19)
   6:   EXACT <K>(8)
   8:   ANYOF[LM](57)
  19: BRANCH(22)
  20:   EXACT <ME>(57)
  22: BRANCH(36)
  23:   EXACT <P>(25)
  25:   ANYOF[AM](57)
  36: BRANCH(50)
  37:   EXACT <S>(39)
  39:   ANYOF[LZ](57)
  50: BRANCH(53)
  51:   EXACT <WX>(57)
  53: BRANCH(56)
  54:   EXACT <YZ>(57)
  56: TAIL(57)
  57: EXACT <XX1>(59)
  59: END(0)
anchored `XX1' at 2 (checking anchored) anchored(BOL) minlen 5
Offsets: [59]
        1[1] 4[1] 5[2] 0[0] 7[1] 8[1] 0[0] 9[4] 0[0] 0[0] 0[0] 0[0] 0[
+0] 0[0] 0[0] 0[0] 0[0] 0[0] 13[1] 14[2] 0[0] 16[1] 17[1] 0[0] 18[4] 0
+[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 22[1] 23[1] 0[0] 24[
+4] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 28[1] 29[2] 0[0]
+ 31[1] 32[2] 0[0] 33[0] 35[3] 0[0] 38[0]

>SLXC1
Guessing start of match, REx `^(?:CP|K[LM]|ME|P[AM]|S[LZ]|WX|YZ)XX1' a
+gainst `SLXC1
'...
String not equal...
Match rejected by optimizer
>
[download]

If you knew the frequency of each of the pre-fixes you would optimise by putting the most common first, or perhaps if there were a vast number of lines with something you certainly did not want (e.g. RWXX1) then you may even add a [^R] in there right at the start.

Cheers,
R.

Pereant, qui ante nos nostra dixerunt!

In reply to Re^2: perl performance vs egrep by Random_Walk
in thread perl performance vs egrep by dba

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Think about Loose Coupling
	PerlMonks