|Perl: the Markov chain saw|
In search of an efficient query abstractorby xaprb (Scribe)
|on Dec 07, 2008 at 14:58 UTC||Need Help??|
xaprb has asked for the
wisdom of the Perl Monks concerning the following question:
I'm building a log analysis tool for MySQL logs. (mk-log-parser in Maatkit, http://www.maatkit.org). One of the more important functions is to take a SQL query from the log and "abstract" it into a "fingerprint" of the query. You can think of it as deriving the query's prototype.
Here's an example:
This query should be lowercased, whitespace collapsed, and parameters removed and replaced by placeholders, thusly:select * from foo where bar > N
This becomes quite a task to do with regexes. Witness the code in QueryRewriter.pm in the fingerprint() subroutine:
The biggest problem is, it's not very efficient. The regexes do a lot of backtracking and stuff. I have profiled the resulting program and found that the regex that converts float/real numbers into "N" is particularly heinously slow.
I'd like to do this with a state machine, character-by-character, for a one-pass solution. I could do this in C. But that's what regexes are for, right? State machines that run char-by-char in C.
I need to make this as fast as possible, even at the expense of memory. This is really critical for the log analysis. What thoughts do you have on this problem?