Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^2: Why machine-generated solutions will never cease to amaze me

by grinder (Bishop)
on Nov 10, 2004 at 15:33 UTC ( #406692=note: print w/ replies, xml ) Need Help??


in reply to Re: Why machine-generated solutions will never cease to amaze me
in thread Why machine-generated solutions will never cease to amaze me

how it handles the following

Well, plugging in that list, I get:

fo(?:o(?:l(?:er)?|-?bar)|a(?:l(?:er)?|bar)|il(?:(?:i?a|e)r)?)

Regexp::Optimizer arrives at the same result, more or less:

fo(?:o(?:l(?:er)?|\-?bar)|a(?:l(?:er)?|bar)|il(?:(?:i?a|e)r)?)

In other words, there were no common tails in any of the alternations. Remember, it's the forward-looking tree that drives things. Tail folding is a secondary consideration. Pretty freaky that the alternation orders came out the same, though.

The main problem for me is that a pure Perl solution is useless, since I need to write the patterns for use in another program that uses the PCRE library (otherwise there is the excellent Text::Trie module).

If the first character of an alternation was an index into a jump table, the cost becomes O(1), at the expense of a slight increase in memory, but as if perl ever cared about that :)

Im really interested in how you are doing your thing

My approach is to take the words, insert them into a tree, and then in a second pass do a depth-first search, reversing the leaves and inserting the reversed leaves into another tree. If I get a tree back (as opposed to a forest), I percolate the common trunk upwards towards the root. When there's a subtree has nothing in common I just unreverse the subtree and unwind the results.

- another intruder with the mooring of the heart of the Perl


Comment on Re^2: Why machine-generated solutions will never cease to amaze me
Select or Download Code
Re^3: Why machine-generated solutions will never cease to amaze me
by demerphq (Chancellor) on Nov 10, 2004 at 16:14 UTC

    Ah ok, your implementation appears to do pretty much the same thing as mine, disabling common suffix handling when there are multiple suffixes. (I strived _hard_ with little luck to overcome that problem). Just in case you havent thought of it already do you collapse things like /fo(?:o|a)l/ to /fo[oa]l/?

    I have to admit ive never used a module for this stuff. Ive always handrolled. :-) I have a specialized Inline::C Trie matcher for digits that is thousands of times faster than the regex engine (when match times alone are relevent).

    ---
    demerphq

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://406692]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2014-07-29 21:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (228 votes), past polls