http://www.perlmonks.org?node_id=966939

So I was working on multi-threading a little tool the last couple of days, but I didn't need any sharing or anything of the sort. I just needed to assign one thread per one file that needed to be parsed. The reason I am writing this is because the documentation can be a little misleading. The one consistent error I saw people make while googling is because of detach(). From the documentation, "Once a thread is detached, it'll run until it's finished; then Perl will clean up after it automatically." When I first started this little tool I thought this was just what I needed. However, what the documentation doesn't say is that if the Main Perl Process closes before a detached thread does, all detached threads are closed as well, regardless of completion or not. Looking back now, it makes sense, but the documentation at the time gave me the impression that Perl created threads not reaped by the Main Thread(when detached). Not only that, but I read somewhere that the potential exists that the resources of the detached threads might also not be freed. So my issue was that all my threads were being created properly but the Main Perl Process was closing everything before my Worker threads could do their job. So here is my solution.

use threads; $num_threads; $thread_limit = xxx; foreach $file_to_execute (@array_of_files) { $thread = new Thread \&do_stuff, param1, $param2; @threadlist = threads->list(threads::running); $num_threads = $#threadlist; while($num_threads >= $thread_limit) { sleep(30); @threadlist = threads->list(threads::running); $num_threads = $#list; } } while($num_threads != -1) { sleep(1); @threadslist = threads->list(threads::running); $num_threads = $#list; } sub do_stuff{ ... }

So as you can see, I create a thread in a foreach loop for each item to be executed so to speak. The first while loop also limits how many threads are running at a time, useful for someone using a weaker computer. I just ran it on my work computer that has two quad cores and 16 gigs of ram without the while loop and it didn't break a sweat, but it might be useful for some people. The second while loop essentially prevents the Main Perl Process from closing while the array of running threads is not empty. Once it is emptied, the program quits...and yes I should have used strict. I was being lazy. I just remembered another reason I wanted to write this up. I was having an issue where

 threads->list(threads::running)

would increment to the max number of threads, but never decremented as threads finished. But now I can't think of how I solved it, or if there was ever truly something wrong in the first place besides my own fingers.

Either way this was a fun little exercise. I went from processing 200+ files(22.5 GB total, with 200+ threads!) in 1 1/2 hours to 23 minutes. The only reason it is as high as 23 minutes is because I have three 4 GB files that I have to read from top to bottom.So near the end I have four threads, the Main process and the three files being read and split apart. Don't think I can do anything about that in terms of threading. I am going to try 80+ gigs and see if I can break my computer.