I have made many attempts at building a character by character state-machine for it, all of which failed miserably and ended up breaking the functionality in some way or other.

I do believe it is possible to construct such a thing, and I do believe that if I had time and inclination to do so that eventually I would solve the problem, but since I have now adopted Plack as the basis for the system to sit upon I'm already getting close to a thousand hits a second out of it using a Quad Core Phenom II as it stands. (16,000 hits in just over 18 seconds, last time I ran my stress testing script)

Given that 24 and 32 processor server systems are readily available and 100+ processor core chips are already in production by companies such as tilera, I don't feel at this stage that there is much point to further optimisation as all the "low hanging fruit" has already been had by optimising the regexes and removing backtracking etc where possible.