Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Is Using Threads Slower Than Not Using Threads?

by Dru (Hermit)
on Nov 01, 2010 at 03:49 UTC ( #868687=perlquestion: print w/replies, xml ) Need Help??
Dru has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks,

I have a large array of ip addresses (3500) that I am using to parse a large log file (500MB). I wanted to take this opportunity to try something new and to optimize my code so I thought threads would give me some hardy performance boost. Much to my disappointment, or more likely to my lack of knowledge with using threads, the code that I ran threads with actually took longer. I did a small test with half the ip's and a 1/10 the file size and according to Benchmark it took 10 seconds longer to run with threads vs without. Am I doing something wrong? Should I put threads on the shelf and explore option options?

Below is the part of the code that I am using threads on. Also I can't get @ReturnData to return anything:
my $thr = threads->new(\&check_fw_logs); my @ReturnData = $thr->join; print "Thread returned @ReturnData"; sub check_fw_logs { my @ips; open (FILE, $fw_file) or croak("Can't open $fw_file: $!\n"); while(<FILE>){ for my $i (@allips){ push(@ips, $_) if /$i/; } } return @ips; }

Perl, the Leatherman of Programming languages. - qazwart

Replies are listed 'Best First'.
Re: Is Using Threads Slower Than Not Using Threads?
by GrandFather (Sage) on Nov 01, 2010 at 05:57 UTC

    Absolutely, unless there is some advantage to be gained by running computationally intensive tasks in parallel. Running I/O bound tasks in parallel almost never offers any time advantage (although sometimes there can be an advantage in terms or code structure).

    However, it is very often the case that a smarter algorithm can provide a substantial performance improvement. For example, in the case of the code you show it maybe that you could use a hash to test or an IP match. Consider:

    use strict; use warnings; my @allIpsList = qw(; my %ipsMatch = map {$_ => 1} @allIpsList; my $testFile = <<TESTDATA; TESTDATA open my $fh, '<', \$testFile; my @found = check_fw_logs (\%ipsMatch, $fh); print "@found\n"; sub check_fw_logs { my ($ipsMatch, $fh) = @_; my @matched; while (defined (my $line = <$fh>)) { next if $line !~ /(\d+\.\d+\.\d+\.\d+)/; push @matched, $1 if exists $ipsMatch->{$1}; } return @matched; }


    For real code you would probably want to normalise the IP numbers wherever they are used and of course you'd use external files etc rather then the tricks I've used to make this a self contained example.

    True laziness is hard work
      Running I/O bound tasks in parallel almost never offers any time advantage
      If it's a single file, one a single disk, with a single controller, yes. But if you have multiple controllers, or reading from multiple files, I/O bound tasks can speed up when done in parallel. In fact, if you have just a one single core CPU, the only time threads will speed up things if you're I/O bound (disk and/or network).

      And then there's the obvious two-way split: one thread doing I/O, while the other does the calculations. That means, your program can still make 'progress' while it's waiting for data. This may give you a speed up even if you have a single core, single CPU, single controller, single disk setup.

      Now, I were the OP, and if I were to go the divide-and-conquer method, I'd try various method, and see which ones are faster on his particular setup:

      1. Two threads/processes: one reading the file, the other checking the IPs.
      2. Use threads/forks and have each thread/child test part of the file.
      3. Have an outer thead/process reading chunks of data, have a bunch of other threads/processing each checking part of the IP addresses.
      Obviously, the latter two allow for lots of tweaking by varying the number of threads/processes.

      But first, I'd try something totally different. Instead of doing 3500 matches for each line, use a single regexp to extract all the IP addresses from the line (it doesn't have to be perfect, it's likely even /([0-9][0-9.]+[0-9])/ will do). The 3500 IP addresses, I would store in a hash. Then it's simple a matter of doing a hash lookup. This should dramatically reduce the number of matches performed. It may also fix a bug: if one of the 3500 ip addresses is "", and the file contains "", then it's reported as a match. This may be intended, but that I would find surprising.

      Thank you GrandFather. What do you mean by normalizing the IP numbers?


      Perl, the Leatherman of Programming languages. - qazwart

        For the hash lookup to work the key string must exactly match. IP numbers may optionally have leading 0 digits to for a three digit number but the strings '' and '' are not eq so you should normalise the IP number to one version or the other - always three digit numbers or always remove leading 0 digits.

        True laziness is hard work

        The easiest way to normalise IPs is to pack them to integers:

        $hash{ pack 'C4', split '.', $ip } = 1;
Re: Is Using Threads Slower Than Not Using Threads?
by patcat88 (Deacon) on Nov 01, 2010 at 06:37 UTC
    Threads, in a text mode non-gui enviroment, only make sense when you have a "blocking" operation that is resource or speed limited, but you have other free and available speed or resources you can do work with. Asynchronous or interleaved file access to a single mechanical hard drive is much slower than synchronous or sequential access due to the seek time of the hard drive. Unless you have a SSD, asynchronous HD I/O will always be slower than synchronous HD I/O unless there is something very wrong with your OS's kernel design (not likely). NCQ wouldn't help in the first place. Your still seeking. Network I/O or I/O to multiple separate drives (not RAID), each with their own filing system (NO RAID!), asynchronous I/O now makes sense. I am not discussing the benefits of caching or OS caching design. Just remember the severe speed penalty in seeking on mechanical HDs.

    Also perl about 1-2 or 1-3 seconds to start each thread based on my visual opinion. Starting threads is also slow because perl rapidly grows in ram usage (which means thrashing/paging/tons of memory allocation requests to the OS) due to design of ithreads. Your code will run much faster non-threaded, but let me give you another optimization if you want to use ithreads in the future.

    Think about your variable @allips. ithreads COPIES all variables/data in perl when you create() a thread. So this huge 3500 IP array, now was made X times in ram when you made threads. Dont create @allips before your create() a thread. Give each thread a numeric range of @allips to be responsible for, then have each thread read off the HD or wherever your getting your IP list from, and build its own private @allips that ONLY has its range. This way you have no duplicate data between threads. You could also create a shared variable using threads::shared. Only 1 copy of a shared variable sits in ram. Under the hood the variable is a tied variable that does proper synchronization/locking to prevent race conditions before accessing the variable.

    Another thing to think about is, watch your CPU usage as a graph in real time when you run your script single threaded. If it hits 80% or higher, your CPU or RAM bandwidth limited. If its lower, your I/O limited. Explore making a ram disk or using a SSD, or even a USB flash stick, load your 500 MB log into the ram disk and benchmark it then, remember to watch your CPU usage.

    Also, for the love of god get rid of your regexp. Use substr and index. 10 times faster than any regexp if its simple string matching. Also, use a "window" to search for inside the string/log file, if the format of your log file isn't line based (I see your doing it line based). If your can predict exactly how many characters to jump ahead based on the file format, thats another optimization. Here is an algorithm I use to cut up a log or blog or forum type HTML page at a very high speed in Perl. I can't think of anything more optimized other than going to assembly. Perl's index command uses Boyer-Moore string search algorithm for index, but old fashioned character by character loop for rindex. Core i series CPUs with SSE5 added a dedicated string search assembly command that works on 16 bytes at a time I think, if your into that (you must make your own XS/C code, and possibly assembly if your compiler doesn't allow assembly ops to be called as a plain function (AKA compiler intrinsic)). I'm not sure if there are any other optimized string search assembly instructions that enable faster string search algorithms added over the last 15 years to x86. You might want to benchmark Perl's Boyer Moore algorithm against an character by character assembly implemented strstr (c standard library). Boyer Moore can be upto 3 times slower on certain data than old fashioned character by character search, but usually is faster than character by character.

    For my use, the XML parser says malformed, so can't use any XML parser, XS or PurePerl, the only Perl HTML parser, which is a PurePerl parser, I know of on CPAN is horribly slow.

    $start = 0; while (($beg = index($htmlpage, '<table', $start)) > -1) { $end = index($htmlpage, '</table>', $start)+8; push(@somearray, substr($htmlpage, $beg, ($end-$start))); $start = $end; }
    Note, don't think of optimizing the above by replacing both "$end"s with the index again. You want to cache the result. Assignment is faster than another index, BUT right hand expression + assignment is slower than just using the right hand expression anonymously/in a bigger expression, so if a value is used once, just stick it in the expression where its used later. Many layers deep in parenthesis might look unclean or be a nightmare to debug, but its faster than assignment and then using the scalar just once later. Also remember that the result of assignment operator is the left hand value after its done the assignment, as you see in my while loop.

    Since your doing line by line record processing, I wonder if it would be faster to slurp up a much bigger (MBs or more) fixed size block and work on that, rather than perl making very small file I/O requests to the OS constantly, which means more seeking. I'm not sure what the buffer size inside perl is for line based record processing.
      Wow, thank you for recommending the use of substr. My best time went from 40 wallclock seconds to 0!
      the code took: 0 wallclock secs ( 0.05 usr + 0.00 sys = 0.05 CPU)

      Perl, the Leatherman of Programming languages. - qazwart
Re: Is Using Threads Slower Than Not Using Threads?
by BrowserUk (Pope) on Nov 01, 2010 at 09:16 UTC

    Effectively, you've done the equivalent of loading your car onto the back of a transporter and driving the transporter to all your appointments. Needless to say, using two vehicles in this way does not speed up your deliveries, and costs the time it takes getting the car on and off the transporter.

    What your code does is simply move the code that would be in the main (initial;startup) thread, into a new thread.

    As such, you're only making use of one thread--the main thread just sits blocked, doing nothing until the new thread completes--so it won't speed anything up.

    But, you've added the costs of

    1. starting a new thread--a few milliseconds at most.
    2. Unless you've shared @allips, then that array will be copied into the new thread.

      That's pure overhead.

    3. copying the results set from the new thread back to the main thread.

      More overhead.

    So yes. The way you are going about it, using a thread that way will be slower than not using them.

    As other have said, the first way to speed up your program will be to use a better algorithm.

    However, if your machine has multiple cores--once you've avoided searching every line 3500 times, and made sure that you're using the fastest search mechanism Perl has to offer--then there might be some further gains to be had by using threading effectively.

    Of course, once you've made the algorithm changes, you might be running quickly enough and not need to use threading. But if you'd like to investigate usng threads properly, speak up.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Unless you've shared @allips, then that array will be copied into the new thread.

      That's pure overhead.

      That isn't very accurate.

      As can be easily seen, the overhead eliminated by using threads::shared isn't actually the lion's share of overhead in copying a large array to a new thread:

      $ alias tperl="time perl -w -Mthreads -Mthreads::shared" # No data to copy: $ tperl -e'threads->create(sub{})->join() for 1..100' $ tperl -e'threads->create(sub{})->join() for 1..100' CPU 0.680 secs (CPU 0.256 secs) # Load the data late so also no data is copied: $ tperl -e'BEGIN{threads->create(sub{})->join() for 1..100}my @x=(1.. +35_000)' CPU 0.728 secs (CPU 0.268 secs) $ tperl -e'BEGIN{threads->create(sub{})->join() for 1..100}my @x;shar +e(@x);@x=(1..35_000)' CPU 0.840 secs (CPU 0.364 secs) # Overhead of copying, whether shared() or not: $ tperl -e'my @x;share(@x);@x=(1..35_000);threads->create(sub{})->joi +n() for 1..100' CPU 2.272 secs (CPU 1.316 secs) $ tperl -e'my @x=(1..35_000);threads->create(sub{})->join() for 1..10 +0' CPU 3.304 secs (CPU 3.384 secs)

      What threads::shared prevents from being copied to each new thread at thread creation time is the particular data for each element of the array. But you can see that it still includes much of the overhead (between 1/3 and 1/2 in the examples above) because it must copy the essentially-tied array (that uses C-based 'get' and 'set' accessors rather than the Perl accessors of actually-tied variables).

      But if you actually expect the threads to make use of the shared data, then you can see the dramatic overhead of copying the data to each thread one-element-at-a-time with locking that threads::shared does:

      $ tperl -e'my @x=(1..35_000);threads->create(sub{for(@x){my$y=$_}})-> +join() for 1..100' CPU 4.324 secs $ tperl -e'my @x;share(@x);@x=(1..35_000);threads->create(sub{for(@x) +{my$y=$_}})->join() for 1..100' CPU 17.281 secs
      starting a new thread--a few milliseconds at most

      If I didn't know better, I might suspect that this is propaganda. It could certainly mislead somebody. As can be seen above, creating a new iThreads instance can trivially take tens times that long.

      Just for comparison, here is how extra data impacts the performance of fork():

      $ tperl -e'fork && exit for 1..100' CPU 0.044 secs (CPU 0.020 secs) $ tperl -e'my @x=(1..35_000);share(@x);fork && exit for 1..100' CPU 0.072 secs (CPU 0.072 secs) $ tperl -e'my @x=(1..35_000);for(1..100){if(fork){for(@x){my$y=$_};ex +it}}' CPU 0.084 secs

      And, as has always happened, every time I touch iThreads, here are some examples of how easy it is to run into stupid things:

      # Try to load the data late but mostly fail: $ tperl -e'threads->create(sub{})->join() for 1..100;my @x=(1..35_000 +)' CPU 2.048 secs (CPU 1.204 secs) # Note how easy it is to do it wrong and not share data but get the ov +erhead: $ tperl -e'my @x=(1..35_000);share(@x);print "($x[0])\n";threads->cre +ate(sub{})->join() for 1..100' Use of uninitialized value in concatenation (.) or string at -e line 1 +. () CPU 3.380 secs (CPU 2.128 secs)

      Update: There are a lot of numbers there. A nice summing-up is that sharing the data between multiple instances and actually using it in each instance (in the above example) is 200x (20,000%) slower using iThreads and threads::shared than when using native fork.

      - tye        

        Dumb is as dumb does.

      Unless you've shared @allips, then that array will be copied into the new thread.
      You still have no clue what it means that sharing uses perl's magic mechanism. Please educate yourself before spreading further FUD about how shared variables don't use significant memory for each thread.
      A math joke: r = | |csc(θ)|+|sec(θ)|-||csc(θ)|-|sec(θ)|| |
      Online Fortune Cookie Search
      Office Space merchandise

        Oh dear. Think context!

        To be more explicit. Think of the context in which I said what I said. Think of the purpose of that.

        Just as newbies don't need to be appraised up front with the details of the inner working of the regex engine, they don't need to be appraised of the inner workings of threads. The point was simply that without having been shared, each thread will get an independent, unshared copy of the array.

        Your interjection is:

        1. unhelpful;

          to the OP;

        2. unwarranted;

          My understanding of the internals of the iThreads is:

          1. perfectly sound thanks;
          2. irrelevant in the context of the assistance I was trying to give the OP.

          You don't need to be Jeff Friedl to help people with their regex problems. And if you were him, it wouldn't be useful to go into the all the DFA/NFA blah blah details of the internals when responding.

          And it seems to me that if you aren't aware of the semantic differences between the copying of non-shared arrays; and the sharing of shared arrays, you're the one lacking understanding. But, I know you almost certainly are aware of those differences, which just confirms that your interjection is ...

        3. therefore, politically motivated.

          Not intended to help the OP, but simply to come out in support of an untenable position.

        Your time might be better spent where it would be most valuable. Eg. correcting some of the unnecessary extravagances of the current implementation.

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Is Using Threads Slower Than Not Using Threads?
by aquarium (Curate) on Nov 01, 2010 at 05:42 UTC
    unless you can write code that effectively divides the problem into separate threads that by design of this problem subdivision end up solving it faster; multi-threading is just slicing the same cake (cpu time) a slightly different way, with overheads for maintaining threading.
    haven't checked your code..i assume it's as efficient as can be.
    the hardest line to type correctly is: stty erase ^H
Re: Is Using Threads Slower Than Not Using Threads?
by raybies (Chaplain) on Nov 01, 2010 at 16:23 UTC
    Lots of really great comments already in this thread, just wanted to add that often running the same script multiple times in an OS that's equipped to handle multiple tasks can actually be faster than threads as well. (Read that in the threads documentation back when I was playing with it.) Often people assume that threads means fast (I always assumed that), when really that's not always the case.
Re: Is Using Threads Slower Than Not Using Threads?
by sundialsvc4 (Abbot) on Nov 02, 2010 at 00:06 UTC

    As others have said ... “it entirely depends upon what you are doing.”

    The task that you are performing is I/O-bound.   The speed at which the procedure will complete, is ruled mostly by the efficiency with which it is able to perform disk I/O.   No matter how bad-and-hairy “a log file record” might be, you can be fairly certain that any decent CPU on this planet will be able to dispose of it in a handful of microseconds.   But even the fastest disk-drives are still ruled by milliseconds.   Therefore, if you subdivide such a procedure into n parallel tasks, each of which are given the same job to do, then you can safely predict that n-1 of them will be spinning their wheels.   There is a price to be paid for those threads, and you’re paying it.

    Now, in contrast to this scenario, let’s briefly contemplate one that might be a truly useful application of threads.   In this scenario, once a disk record has been read in, it might be a command for the computer to perform some unit-of-work that consists of a varying (and unpredictable) amount of CPU-plus-I/O activity.   Now it might well be productive to assign one thread to the task of “disk I/O,” and a pool of n threads to the task of doing the work.   The number of threads in the pool (n) is rigidly controlled (and adjustable).   The number of threads devoted to disk I/O is limited to 1 because, well, there’s only 1 device to be waited on.

Re: Is Using Threads Slower Than Not Using Threads?
by aquarium (Curate) on Nov 02, 2010 at 23:09 UTC
    if this was a batch process that gets run often, and needs a performance boost. what i would do is setup a script or two to grab & normalize & sort the ip address list and the ips from the large log file. then use join command.
    also wondering how any code would compare to pre-loading the ips and log file into a DB and doing a SELECT * FROM log_file_table INNER JOIN ip_list_table ON log_file_table.ip = ip_list.ip;
    of course there would be earlier overhead in normalizing the ips and loading the DB. But i think that just a DB would come out on top doing such a join, compared to any human written code to do same.
    the hardest line to type correctly is: stty erase ^H

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://868687]
Approved by planetscape
Front-paged by planetscape
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2018-07-19 21:31 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (420 votes). Check out past polls.