|Problems? Is your data what you think it is?|
Speed of Perl Regex Engineby Clovis_Sangrail (Beadle)
|on Nov 28, 2012 at 16:04 UTC||Need Help??|
Clovis_Sangrail has asked for the
wisdom of the Perl Monks concerning the following question:
Hello Perl Monks,
I use Perl to generate daily audit reports from sets of Jounal Log Files produced by GT.M, an implementation of the MUMPS database/language. Each Journal line of interest includes a Username, a Global Variable and a description of the transaction on it. The report just presents a listing and count of the Global Variable modificatons, broken out by Username.
The customer wanted the capability to ignore some Globals that were not of interest. They can edit a file of such Globals, and I read that file and build an Inclusive-Or type of Regex that I pass to the Perl program as a commandline parameter. The program matches the Global Variable name from each Journal line against that Regex, and skips it if found.
But I did not realize just how popular this capability would be! I figured there would only ever be a few such Globals to skip, but the Customer has entered 54 of them so far, and they say there will be more! The Regex that I give to the Perl program is now about 750 characters long, and some of the bigger banks being audited produce over a million lines of Journal each day.
The reports for those banks do take noticeably longer to produce than when the system first went online, and I don't have much knowledge of or feel for the performance of the Perl Regex engine. Is it linear, like will it take ten times as long to match against a 600-character Regex than against a 60-character one?
I realize that this is just the sort of thing that enterprising Perl students study via test programs, and I may do that sort of thing. But I also do want to be able to tell the folks who sign my check that I am asking around, too.