Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
There's more than one way to do things
 
PerlMonks  

Re^2: In search of an efficient query abstractor

by xaprb (Scribe)
on Dec 07, 2008 at 20:36 UTC ( #728777=note: print w/ replies, xml ) Need Help??


in reply to Re: In search of an efficient query abstractor
in thread In search of an efficient query abstractor

Hmmm. Good catch, although I thought I had a test case to ensure that was handled correctly. I'll check that :) You seem to know more about regexes than I do. Why would you guess this particular regex is slow? Line-by-line profiling proves that it is consuming the vast majority of the time. It consumes almost 300 CPU seconds, and the next most expensive line in this code consumes 89 seconds, on an 8GB file.

The next-most-worst offender is

$query =~ s/(?<=\w_)\d+(_\d+)?\b/$1 ? "N_N" : "N"/eg;

followed by

$query =~ s/\s{2,}/ /g;


Comment on Re^2: In search of an efficient query abstractor
Select or Download Code
Re^3: In search of an efficient query abstractor
by ikegami (Pope) on Dec 07, 2008 at 20:46 UTC

    Are you doing the whole 8GB file at once? If your string starts with "a b c d", $query =~ s/\s{2,}/ /g; needs to copy 32GB of text. Just for the first 10 characters.

      No, it's done one entry at a time. Each entry is a header with some commented lines, followed by a query. There are special cases, but it generally looks like

      # Time: 071015 21:43:52 # User@Host: root[root] @ localhost [] # Query_time: 2 Lock_time: 0 Rows_sent: 1 Rows_examined: 0 use test; select sleep(2) from n;
Re^3: In search of an efficient query abstractor
by Corion (Pope) on Dec 07, 2008 at 20:50 UTC

    At least for the last case, Perl has an optimized version:

    $query =~ tr[ ][]s;

    and it should be faster or at least as fast as the s/// version. Another version to try would be s/\s+/ /g - there is no need to use the counting variant of {2,}, and skipping might be slower than just writing the output "replacement".

      $query =~ tr[ \n\t\r\f][ ]s; turns out to be a lot faster than any s/// variant. That change moves this line from #4 badness to #28 badness. Still having trouble with the floats, though.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://728777]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (9)
As of 2014-04-24 11:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (565 votes), past polls