Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: Splitting large array for threads.

by BrowserUk (Pope)
on Jun 15, 2014 at 07:24 UTC ( #1089922=note: print w/replies, xml ) Need Help??

in reply to Splitting large array for threads.

Basically, i'm trying to literate over an array of over 4million entrys. At around 800 entrys the script just stops.

Neither your post nor your code make any sense.

There is no "array" (of "over 4million entrys" or otherwise) anywhere in your code. There is a file that you are reading into a queue.

But, the way you are reading that file and populating the queue makes no sense.

You wait until the queue is empty and then populate it with one line for each worker thread. You also place the current read position of the file into a shared variable after reading each line.

However, as you only have one shared variable $pos, by the time the worker threads use that value, you will have overwritten it several times, so the same position will be attributed to several lines. Ie. NTHREADS line will be reported with the same position, but only one of them will be correct. Nonsense.

Based upon what your posted code is actually doing, there is no logic in using threads to process this file, because the overheads of locking and queuing and far exceed the cost of the per-line processing -- which consists entirely of printing each line to the console. What's more, those line will be printed in some semi-random order.

You'd be better off with a simple:

while( <FILE> ) { printf "0: (%10d,%10d) : %s", tell( FILE ), $size, $_; }

At least the lines would be printed in the same order they are read and with a different position -- albeit the position of the end of the line +1 rather than the start of it. And it will run much, much more quickly for the absence of threads.

Processing the lines of a single file -- that must be read from disk serially -- using multiple threads makes no sense, unless the processing involved for each line takes longer than it takes to read that line from disk. Disks are slow; so you have to be doing a considerable amount of processing per line for that to be true.

As for why it hangs after 800 lines: it isn't immediately obvious by reading the code, but I'm not going to expend effort to either verify that nor attempt to debug it, because it is nonsensical, do nothing useful code.

I appreciate that when we start using something new, we often write do nothing programs to get a feel for stuff, but expecting others to debug that nonsensical code it asking a lot when there is no clear perspective of what you are hoping to achieve.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1089922]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2018-05-22 18:46 GMT
Find Nodes?
    Voting Booth?