Thanks RichardK for your kind reply.
This is my first time to use multi-threads in perl. So I'm not sure if I properly use it or not. Even it is likely to to be I/O bound, the multi-threads one shouldn't run slower than the single thread one, am I right? Please correct me if I'm wrong. Thanks again for your help.
Re^2: problem of my multithreading perl script
Replies are listed 'Best First'.
No, there are multiple ways that a threaded program can be slower than a single threaded program. In this case, I don't know exactly which one(s) you're hitting, but there are two candidates I can think of:
Disk thrashing: When you're reading a file sequentially, the disk drive can normally read sector after sector without (many) intermediate seeks. If you're reading multiple files at the same time, your drive needs to seek frequently between file locations. In other words, reading files one at a time generally has fewer disk seeks than reading them in parallel, and seeks are slow operations.
Data structure locking: In a single-threaded program, you don't need to worry about locking your data structures. But in a multithreaded program, you need to lock data structures when modifying them if another thread has the potential to also update the data structure. Even when there's no contention when accessing the data structure you're spending time creating and freeing locks.
Having said that, you can probably speed things up without going to a single-threaded program.
If your program is experiencing disk thrashing, you might be able to avoid it by placing your files on different drives. That way you won't have as many disk seeks.
If your program is spending too much time managing data structure locks, you might be able to rearrange your code a bit to reduce the number of locks. For example, you might have each thread read and process a file in a local (nonshared) data structure, and then once it's finished a file, then merge the data into the shared structure.
There are other possible problems and solutions to your problem, these are just the ones that came to mind. (I've run into both of them frequently enough...unfortunately.)
Actually it will run slower. All of the threads are writing to the same hash. For each write, the hash is locked and any other threads have to way until the current thread finishes its write before they can start.
Try having each thread write to a separate hash, then merge them in the main thread once the reading is complete.