http://www.perlmonks.org?node_id=771629


in reply to Suggestion for regular expression speed improvement.

Corion is right with his suggestions. If you're still interested in how to speed up the regex, here it goes:

The first .+ will first match all characters, then gives up characters until the \t finds the first tab, then the second .+ has no more character to match, then the first .+ has to give up characters again etc.

To avoid all that backtracking, you should substitute each .+ by something that matches everything except tabulators, [^\t]+.

Replies are listed 'Best First'.
Re^2: Suggestion for regular expression speed improvement.
by bala.linux (Novice) on Jun 15, 2009 at 12:35 UTC
    This sounds good. I will adopt this change and compare the performance. Thanks.
      No, don't. Go with the tips Corion gave you above, it's much more sensible to use split or a module - my explanation was mostly to satisfy academic curiosity, and not meant as a suggestion on how to solve your problem.
        As I mentioned above, I would not be able to take that approach. Since, I want to enable support to match log lines of any format with grouping. If I go by CSV, then I would not be able to parse other formatted logs like syslogs and other proprietary logs.