Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Help with speeding up regex

by davido (Cardinal)
on Aug 10, 2012 at 23:53 UTC ( [id://986842]=note: print w/replies, xml ) Need Help??


in reply to Help with speeding up regex

Have you already profiled with Devel::NYTProf so that you're certain the pattern matching is where you're spending too much time? This is a really worthwhile thing to do; it would be unfortunate spending time focusing on fixing the regular expression only to find you are IO bound. Your hunch may be correct, but it's best to know for sure before diving into an optimization effort.

If you need to install Devel::NYTProf, install version 4.06 (not 4.07) from CPAN, since v4.07 has a minor bug in the nytprofhtml utility that would prevent you from using that utility to see the results in your browser.

(There's a simple fix, patch submitted, and a future version will certainly have made the repair.)

Update:Tim Bunce has released v4.08 now, which patches the problem from v4.07. So assuming it's found its way to your local CPAN mirror, you should be able to install Devel::NYTProf with the simple "cpanm Devel::NYTProf" command, and a few minutes of patience.


Dave

Replies are listed 'Best First'.
Re^2: Help with speeding up regex
by BrowserUk (Patriarch) on Aug 11, 2012 at 11:13 UTC

    How does NYTProf speed up regex?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      He said "so that you're certain the pattern matching is where you're spending too much time? " -- is regex pattern bottleneck, yes or no? Its a good question

        Did you look at the regex? That's pretty much a given.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re^2: Help with speeding up regex
by eversuhoshin (Sexton) on Aug 14, 2012 at 03:31 UTC

    Hellp Davido, Thank you for your kind reply. Sorry but I am using active perl using Komodo, can you help me with utilizing Devel::NYTProf? What do I have to do to assess my script? I am still learning and I didn't fully understand the cspan explanation. Thank you again

      ActiveState has Devel::NYTProf in its ppm4 repository here: http://code.activestate.com/ppm/Devel-NYTProf/. Once installed, you would cd into the target script's directory and execute a one-liner that invokes your script: perl -d:NYTProf some_perl.pl input_file.txt. And after it completes, you can review the results by executing the following statement: nytprofhtml --open (while still in the same directory). You should get a browser window with more useful information than you can shake a stick at.

      My optimized regex is going to help as an optimization of the exact regex you provided. But it's tricky to implement and maintain as your needs continue to evolve. A better solution would be to use threads, or to fork processes. BrowserUk already had some suggestions on how you might implement such a strategy. The beauty of that sort of approach is that you don't have to concern yourself quite as much with how efficient the regular expressions themselves are because you're processing several files in parallel.

      If you end up with a ton of data every day that has to get chewed through before tomorrow, you might look into a Map-Reduce strategy such as with hadoop.


      Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://986842]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-19 06:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found