Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: Why machine-generated solutions will never cease to amaze me

by grinder (Bishop)
on Nov 10, 2004 at 15:33 UTC ( [id://406692]=note: print w/replies, xml ) Need Help??


in reply to Re: Why machine-generated solutions will never cease to amaze me
in thread Why machine-generated solutions will never cease to amaze me

how it handles the following

Well, plugging in that list, I get:

fo(?:o(?:l(?:er)?|-?bar)|a(?:l(?:er)?|bar)|il(?:(?:i?a|e)r)?)

Regexp::Optimizer arrives at the same result, more or less:

fo(?:o(?:l(?:er)?|\-?bar)|a(?:l(?:er)?|bar)|il(?:(?:i?a|e)r)?)

In other words, there were no common tails in any of the alternations. Remember, it's the forward-looking tree that drives things. Tail folding is a secondary consideration. Pretty freaky that the alternation orders came out the same, though.

The main problem for me is that a pure Perl solution is useless, since I need to write the patterns for use in another program that uses the PCRE library (otherwise there is the excellent Text::Trie module).

If the first character of an alternation was an index into a jump table, the cost becomes O(1), at the expense of a slight increase in memory, but as if perl ever cared about that :)

Im really interested in how you are doing your thing

My approach is to take the words, insert them into a tree, and then in a second pass do a depth-first search, reversing the leaves and inserting the reversed leaves into another tree. If I get a tree back (as opposed to a forest), I percolate the common trunk upwards towards the root. When there's a subtree has nothing in common I just unreverse the subtree and unwind the results.

- another intruder with the mooring of the heart of the Perl

Replies are listed 'Best First'.
Re^3: Why machine-generated solutions will never cease to amaze me
by demerphq (Chancellor) on Nov 10, 2004 at 16:14 UTC

    Ah ok, your implementation appears to do pretty much the same thing as mine, disabling common suffix handling when there are multiple suffixes. (I strived _hard_ with little luck to overcome that problem). Just in case you havent thought of it already do you collapse things like /fo(?:o|a)l/ to /fo[oa]l/?

    I have to admit ive never used a module for this stuff. Ive always handrolled. :-) I have a specialized Inline::C Trie matcher for digits that is thousands of times faster than the regex engine (when match times alone are relevent).

    ---
    demerphq

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://406692]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-03-19 07:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found