Evolution, v2.0

It's me again..

As some of you may remember, I posted a series of notes about evolving an OO design a while back. Well, one of the late replies was so interesting that I thought it was worth starting a new thread for my response.

The short summary is that Aristotle followed my logic about halfway, then applied a couple ideas of his own, and posted a completely different result for comparison and contrast.

For one thing, I think it's incredibly cool that he chose to devote head-time to the idea. It's always fun to have someone demonstrate that they understand what you're saying by handing it back to you in a shape you didn't expect. (One might even consider it worthwhile to upvote his node -- hint, hint ;-)

On a less personal level, his post is a great follow-up to the intent of the original message.. namely, to show new OO programmers how objects end up looking the way they do.

You should probably read both nodes if you want to grok the following discussion in fullness, but the general idea is that I built a state machine using distinct objects to represent each state, and he built a state machine as a single object that contains a lookup table for all the states.

And to cut to the chase: his version works, too. That's what makes it such a great follow-up.

My design is decentralized, and his is centralized. Both options work, and each has its good points and bad points. It's impossible to declare one approach better than another on its own merits, and for this particular problem, the choice is pretty much a matter of personal taste.

I happen to like decentralized systems, because they force me to think about loose coupling.

Since this is a semi-tutorial, the term coupling more or less means "How much other code do I have to read before I can understand this piece right here?" Tight coupling twists code from all over the program into a single, migrane-inducing snarl. Loose coupling means you can pretty much ignore everything but the piece you're looking at right now. Global variables tend to create tight coupling. So does looking into a data structure and twiddling its values directly. Function calls and data formatting protocols, OTOH, tend to promote loose coupling.

For OO programmers, things like encapsulation (the Has-A relationship) and public data attributes tend to create tight coupling, while 'protocol classes' (base classes that define a hierarchy's entire interface) and standalone classes promote loose coupling.

Another way of looking at the problem is to ask, "how many 'use' statements will I have to execute before this code runs, if all the classes live in separate files?" And "how many files will I have to read before I know the actual count?"

</digression>

Submerging one object in another usually makes it easier to see the high-level logic of a given operation, but it also increases the coupling between classes. The outer object can end up wanting all sorts of information from the inner object, until it becomes easier to demote the inner object to an ordinary data structure.

Sometimes that's useful, but sometimes it just buries the messiness one layer deeper in the design. As a general rule, I try to make my designs as decentralized as possible, and only embed objects when the alternatives are just too baroque to be reasonable.

For this design, I chose to make the states objects in their own right because i plan to make them pretty complex later on. The state machine will be nondeterministic, meaning that zero, one, or many transitions might be valid for any given input character. That makes finding the 'right' transition more complicated than it would be for a deterministic state machine, where you get exactly one transition (or an error) for every input character.

Had I been building a determinstic state machine, it's very possible that I would have gone with a centralized design. The transition table would be easy to work with, and there wouldn't be much point in promoting the states to objects in their own right.

As it stands, the process of finding the 'right' transition for a nondeterministic state machine (called subset construction) is complicated enough that it makes sense to let the individual states do some of the work.

Those requirements go beyond what I talked about in my original post, though. Nor am I absolutely positive that using a set of independent states is the best way of solving the problem. A single, centralized state machine may end up being a better design. The whole point of this exercise is to find out.

So kudos to brother Aritotle for coming up with a really good alternate solution based on the information he was given. The one tiny quibble I have is that his final version of pseudocode puts the 'rewind and restart' operations before the 'token' operation, and if you think about the actual implementation, it's easier to do them the other way around (use the last-start and last-accept pointers to find the string, then reset them)

Discussions like this are why I like this place so much. ;-)

Comment on Evolution, v2.0

Replies are listed 'Best First'.
Re: Evolution, v2.0 by princepawn (Parson) on Dec 03, 2002 at 14:22 UTC
As it stands, the process of finding the 'right' transition for a nondeterministic state machine (called subset construction) is complicated enough that it makes sense to let the individual states do some of the work. But where does the complexity lie? Usually you can express the state transition calculation in a `sub`, with the parameters of the `sub`being the variables which are used to determine the next state. Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality	[reply] [d/l] [select]
Re2: Evolution, v2.0 by mstone (Deacon) on Dec 04, 2002 at 02:13 UTC
For a complete answer, find a good book about compilers and look up subset construction. The Dragon Book (aka: Compilers: Principles, Techniques and Tools) by Aho, Sethi, & Ullman is the canonical reference. In less formal terms, you track the states of an NFA by combining all the possible transitions each step of the way. If input character 'a' can take you to states B, C, and D, you then check B, C, and D to see if they have transitions for the next input character. In effect, the set {B,C,D} of nondeterministic states is itself a state in a deterministic finite automaton (DFA) that recognizes the same string. Processing that ever-growing list of states that might have transitions is nontrivial work. The traditional solution uses two stacks: stack one contains all the states you want to check for transitions, and whenever you find a valid transition you push the result state onto stack two. When stack one is finally empty, you advance to the next input character, swap stack one with stack two, and start over again. The process becomes even more fun when your NFA contains what are called epsilon transitions.. 'automatic' transitions that link two states without requiring any input. Epsilon transitions make it easier to model things like Kleene closures (the splat operator ('') in a regular expression) in ways that are (comparatively) easy to read: `a DFA that recognizes the string 'a': (s0)- a ->(s1) an NFA with epsilon transitions that recognizes the string 'a': +-->(s1)- a ->(s2)- e -+ \| \| +-->(s0)- e -+----------------------+-->(s3)- e -+ \| \| +------------------------------------------------+` [download] To find the set of states that might possibly accept the next input character, you have to follow all the epsilon transitions to states you can reach without having to accept any input. Then you have to follow any e-transitions those states might have, etcetera. In the diagram above, for instance, you have to follow three e-transitions before you can recognize the second 'a': one from s2 to s3, a second from s3 to s0, and a third from s0 to s1. The result is known as the e-closure of the original state. You also have to watch out for e-transitions that will take you in a loop.. from s0 to s3, and back to s0, for instance. During subset construction, you have to calculate the e-closure of a whole set of NFA states. You can do that as soon as you find a valid transition (if the transition is valid, push the e-closure of the result state onto stack two), or you can do it before looking for transitions (pop the next NFA state off stack one, find the e-closure, and check all those states for transitions). Either way, it takes some fiddling around. Both the process of building an e-closure and the process of searching for transitions break down nicely for a divide-and-conquer strategy, though. If a state is smart enough to find its own e-transitions (fairly trivial), it can then ask all its e-neighbors to report their own e-transitions. The same is true for checking transitions. Therefore, it makes sense to turn the states into objects that are smart enough to handle their own chunk of the calculation.	[reply] [d/l]
Loose coupling good... by pdcawley (Hermit) on Dec 04, 2002 at 11:07 UTC
Lots of small .pm files, a pain. When I'm playing around in Smalltalk of course this isn't really a problem. Need to subclass? Just tell the browser that that's what you want to do (or the refactoring browser if you've got it) and away you go. The image just handles it. In Perl it's a tad trickier. If you stick with the 'one file, one package' rule, you can end up with loads of tiny files and a dependency issue because, somewhere you need to add an appropriate set of use statements (hopefully you can just do that in your parent class). And if you go with multiple classes per file, what happens when one of your subclasses gets really big and you have to move it out into its own file; you end up introducing an inconsistency of handling that can catch you out later. I wonder if there's a case for adding an `import` method into either UNIVERSAL or your application's base class that would allow you to do tricks like: `# Load SomeClass, SomeClass::Subclass, and SomeClass::AnotherSubClass use SomeClass qw/::SubClass ::AnotherSubClass/; # Load SomeOtherClass and SomeFullyQualifiedSubClass::Name use SomeOtherClass qw/SomeFullyQualifiedSubClass::Name/; # Load all classes 'below' StateClass in the namespace use StateClass '*';` [download] So, is that insane? Or should I go away and write it?	[reply] [d/l] [select]
Re: Loose coupling good... by Aristotle (Chancellor) on Dec 08, 2002 at 13:09 UTC
You will probably be interested in ModuleBundle and Module::Require. Makeshifts last the longest.	[reply]