|Do you know where your variables are?|
Re: Is Using Threads Slower Than Not Using Threads?by patcat88 (Deacon)
|on Nov 01, 2010 at 06:37 UTC||Need Help??|
Threads, in a text mode non-gui enviroment, only make sense when you have a "blocking" operation that is resource or speed limited, but you have other free and available speed or resources you can do work with. Asynchronous or interleaved file access to a single mechanical hard drive is much slower than synchronous or sequential access due to the seek time of the hard drive. Unless you have a SSD, asynchronous HD I/O will always be slower than synchronous HD I/O unless there is something very wrong with your OS's kernel design (not likely). NCQ wouldn't help in the first place. Your still seeking. Network I/O or I/O to multiple separate drives (not RAID), each with their own filing system (NO RAID!), asynchronous I/O now makes sense. I am not discussing the benefits of caching or OS caching design. Just remember the severe speed penalty in seeking on mechanical HDs.
Also perl about 1-2 or 1-3 seconds to start each thread based on my visual opinion. Starting threads is also slow because perl rapidly grows in ram usage (which means thrashing/paging/tons of memory allocation requests to the OS) due to design of ithreads. Your code will run much faster non-threaded, but let me give you another optimization if you want to use ithreads in the future.
Think about your variable @allips. ithreads COPIES all variables/data in perl when you create() a thread. So this huge 3500 IP array, now was made X times in ram when you made threads. Dont create @allips before your create() a thread. Give each thread a numeric range of @allips to be responsible for, then have each thread read off the HD or wherever your getting your IP list from, and build its own private @allips that ONLY has its range. This way you have no duplicate data between threads. You could also create a shared variable using threads::shared. Only 1 copy of a shared variable sits in ram. Under the hood the variable is a tied variable that does proper synchronization/locking to prevent race conditions before accessing the variable.
Another thing to think about is, watch your CPU usage as a graph in real time when you run your script single threaded. If it hits 80% or higher, your CPU or RAM bandwidth limited. If its lower, your I/O limited. Explore making a ram disk or using a SSD, or even a USB flash stick, load your 500 MB log into the ram disk and benchmark it then, remember to watch your CPU usage.
Also, for the love of god get rid of your regexp. Use substr and index. 10 times faster than any regexp if its simple string matching. Also, use a "window" to search for inside the string/log file, if the format of your log file isn't line based (I see your doing it line based). If your can predict exactly how many characters to jump ahead based on the file format, thats another optimization. Here is an algorithm I use to cut up a log or blog or forum type HTML page at a very high speed in Perl. I can't think of anything more optimized other than going to assembly. Perl's index command uses Boyer-Moore string search algorithm for index, but old fashioned character by character loop for rindex. Core i series CPUs with SSE5 added a dedicated string search assembly command that works on 16 bytes at a time I think, if your into that (you must make your own XS/C code, and possibly assembly if your compiler doesn't allow assembly ops to be called as a plain function (AKA compiler intrinsic)). I'm not sure if there are any other optimized string search assembly instructions that enable faster string search algorithms added over the last 15 years to x86. You might want to benchmark Perl's Boyer Moore algorithm against an character by character assembly implemented strstr (c standard library). Boyer Moore can be upto 3 times slower on certain data than old fashioned character by character search, but usually is faster than character by character.
For my use, the XML parser says malformed, so can't use any XML parser, XS or PurePerl, the only Perl HTML parser, which is a PurePerl parser, I know of on CPAN is horribly slow.
Note, don't think of optimizing the above by replacing both "$end"s with the index again. You want to cache the result. Assignment is faster than another index, BUT right hand expression + assignment is slower than just using the right hand expression anonymously/in a bigger expression, so if a value is used once, just stick it in the expression where its used later. Many layers deep in parenthesis might look unclean or be a nightmare to debug, but its faster than assignment and then using the scalar just once later. Also remember that the result of assignment operator is the left hand value after its done the assignment, as you see in my while loop.
Since your doing line by line record processing, I wonder if it would be faster to slurp up a much bigger (MBs or more) fixed size block and work on that, rather than perl making very small file I/O requests to the OS constantly, which means more seeking. I'm not sure what the buffer size inside perl is for line based record processing.