Ive written a number of these regex optimizers and I never saw a satisfactory way of handling the tail. Im really interested in how you are doing your thing, and also how it handles the following:
Also, in experiments ive done, building a Trie and then using it for these tasks outperforms such precompiled regexes as soon as the number of words involved becomes more than a small number. Iirc i saw pureperl Trie solutions outperform regex solutions when the dictionary size was more than a few hundred words. Backtracking over all of those alternations is expensive, whereas a trie solution is entirely backtracking free.