<?xml version="1.0" encoding="windows-1252"?>
<node id="1006062" title="Speed of Perl Regex Engine" created="2012-11-28 11:04:41" updated="2012-11-28 11:04:41">
<type id="115">
perlquestion</type>
<author id="970457">
Clovis_Sangrail</author>
<data>
<field name="doctext">
&lt;p&gt;Hello Perl Monks,&lt;/p&gt;
&lt;p&gt;I use Perl to generate daily audit reports from sets of Jounal Log Files produced by GT.M, an implementation of the MUMPS database/language. Each Journal line of interest includes a Username, a Global Variable and a description of the transaction on it. The report just presents a listing and count of the Global Variable modificatons, broken out by Username.&lt;/p&gt;
&lt;p&gt;The customer wanted the capability to ignore some Globals that were not of interest. They can edit a file of such Globals, and I read that file and build an Inclusive-Or type of Regex that I pass to the Perl program as a commandline parameter. The program matches the Global Variable name from each Journal line against that Regex, and skips it if found.&lt;/p&gt;
&lt;p&gt;But I did not realize just how popular this capability would be! I figured there would only ever be a few such Globals to skip, but the Customer has entered 54 of them so far, and they say there will be more! The Regex that I give to the Perl program is now about 750 characters long, and some of the bigger banks being audited produce over a million lines of Journal each day. &lt;/p&gt;
&lt;p&gt;The reports for those banks do take noticeably longer to produce than when the system first went online, and I don't have much knowledge of or feel for the performance of the Perl Regex engine. Is it linear, like will it take ten times as long to match against a 600-character Regex than against a 60-character one?&lt;/p&gt;
&lt;p&gt;I realize that this is just the sort of thing that enterprising Perl students study via test programs, and I may do that sort of thing. But I also do want to be able to tell the folks who sign my check that I am asking around, too.&lt;/p&gt;
</field>
</data>
</node>
