|Perl: the Markov chain saw|
how it handles the following
Well, plugging in that list, I get:fo(?:o(?:l(?:er)?|-?bar)|a(?:l(?:er)?|bar)|il(?:(?:i?a|e)r)?)
Regexp::Optimizer arrives at the same result, more or less:
In other words, there were no common tails in any of the alternations. Remember, it's the forward-looking tree that drives things. Tail folding is a secondary consideration. Pretty freaky that the alternation orders came out the same, though.
The main problem for me is that a pure Perl solution is useless, since I need to write the patterns for use in another program that uses the PCRE library (otherwise there is the excellent Text::Trie module).
If the first character of an alternation was an index into a jump table, the cost becomes O(1), at the expense of a slight increase in memory, but as if perl ever cared about that :)
Im really interested in how you are doing your thing
My approach is to take the words, insert them into a tree, and then in a second pass do a depth-first search, reversing the leaves and inserting the reversed leaves into another tree. If I get a tree back (as opposed to a forest), I percolate the common trunk upwards towards the root. When there's a subtree has nothing in common I just unreverse the subtree and unwind the results.
- another intruder with the mooring of the heart of the Perl