in reply to
Matching behavior with (?^u)
This was fixed in 5.17.3 by the following commit:
Author: Karl Williamson <email@example.com>
Date: Thu Aug 9 14:38:03 2012 -0600
regcomp.c: Set flags when optimizing a [char class]
A bracketed character class containing a single Latin1-range chara
has long been optimized into an EXACT node. Also, flags are set t
include SIMPLE. However, EXACT nodes containing code points that
different when encoded under UTF-8 versus not UTF-8 should not be
To fix this, the address of the flags parameter is now passed to
regclass(), the function that parses bracketed character classes,
now sets it appropriately. The unconditional setting of SIMPLE th
always done in the code after calling regclass() has been removed.
In addition, the setting of the flags for EXACT nodes has been pus
into the common function that populates them.
regclass() will also now increment the naughtiness count if optimi
a node that normally does that. I do not understand this heuristi
behavior very well, and could not come up with a test case for it;
experimentation revealed that there are no test cases in our test
for which naughtiness makes any difference at all.