Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: Why machine-generated solutions will never cease to amaze me

by grinder (Bishop)
on Nov 10, 2004 at 15:33 UTC ( #406692=note: print w/replies, xml ) Need Help??


in reply to Re: Why machine-generated solutions will never cease to amaze me
in thread Why machine-generated solutions will never cease to amaze me

how it handles the following

Well, plugging in that list, I get:

fo(?:o(?:l(?:er)?|-?bar)|a(?:l(?:er)?|bar)|il(?:(?:i?a|e)r)?)

Regexp::Optimizer arrives at the same result, more or less:

fo(?:o(?:l(?:er)?|\-?bar)|a(?:l(?:er)?|bar)|il(?:(?:i?a|e)r)?)

In other words, there were no common tails in any of the alternations. Remember, it's the forward-looking tree that drives things. Tail folding is a secondary consideration. Pretty freaky that the alternation orders came out the same, though.

The main problem for me is that a pure Perl solution is useless, since I need to write the patterns for use in another program that uses the PCRE library (otherwise there is the excellent Text::Trie module).

If the first character of an alternation was an index into a jump table, the cost becomes O(1), at the expense of a slight increase in memory, but as if perl ever cared about that :)

Im really interested in how you are doing your thing

My approach is to take the words, insert them into a tree, and then in a second pass do a depth-first search, reversing the leaves and inserting the reversed leaves into another tree. If I get a tree back (as opposed to a forest), I percolate the common trunk upwards towards the root. When there's a subtree has nothing in common I just unreverse the subtree and unwind the results.

- another intruder with the mooring of the heart of the Perl

Replies are listed 'Best First'.
Re^3: Why machine-generated solutions will never cease to amaze me
by demerphq (Chancellor) on Nov 10, 2004 at 16:14 UTC

    Ah ok, your implementation appears to do pretty much the same thing as mine, disabling common suffix handling when there are multiple suffixes. (I strived _hard_ with little luck to overcome that problem). Just in case you havent thought of it already do you collapse things like /fo(?:o|a)l/ to /fo[oa]l/?

    I have to admit ive never used a module for this stuff. Ive always handrolled. :-) I have a specialized Inline::C Trie matcher for digits that is thousands of times faster than the regex engine (when match times alone are relevent).

    ---
    demerphq

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://406692]
help
Chatterbox?
[marto]: "Eugenie Scott, executive director of the National Center for Science Education, dubbed this approach the Gish gallop, describing it as "where the creationist is allowed to run on for 45 minutes or an hour, spewing forth torrents of error that the
[marto]: evolutionist hasn't a prayer of refuting in the format of a debate." She also criticized Gish for failing to answer objections raised by his opponents"
[erix]: one would hope evolutionists haven't any prayers anyway
[marto]: obviously someone could be religious, but not creationist
[erix]: "Nothing in Intelligent Design makes sense except in the light of Creationism" <-- I made that one up myself (free after Dobzhansky )
[erix]: yes. Deplorable marto, deplorable.
[marto]: the situation seemed similar to this one, majority of the contributrions are nonsense, doesn't address any questions ...
[marto]: meh, I've been called worse :P

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (9)
As of 2017-07-28 15:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I came, I saw, I ...
























    Results (431 votes). Check out past polls.