Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re: multi threading

by sundialsvc4 (Abbot)
on Jun 01, 2013 at 13:43 UTC ( #1036439=note: print w/replies, xml ) Need Help??

in reply to multi threading

This is strictly an I/O-bound process, so threading is unlikely to help you much here.   The ruling constraint is how fast the (one?) disk-drive can spin and move its read/write head assembly around ... and particularly, how often the read/write head is obliged to move from one place to another.   This is why you sometimes encounter the at-first counter-intuitive finding that “adding threads makes it slower,” this being a result of essentially randomizing the pattern of back-and-forth movements the read/write heads must make, and greatly increasing the “churn” of the operating-system’s buffers.

What I think you really, really want to do here is to use, say, a SQLite database file (or files), indexing the data so that you do not in fact have to read every line to find what might be a match.   Indexes do not have to be perfect in order to be useful.   Any strategy that reduces the amount of records that must actually be examined, by any means whatever, is going to be worthwhile:   in this case, you simply want to separate the data into meaningful clumps so that you only need to iterate through one of them.

Another useful strategy, especially if you are “clumping,” is to grab a big handful of records-to be-looked-for into a memory structure such as a list, big enough to fit in real memory (i.e. without paging ...), so that you can make each I/O against the other file do more work for you. Having spent the I/O time to retrieve the record (and its neighbors), you can compare it against the entire handful without incurring more I/O cost.   (But if the structure is so large (among all the processes that may exist) that it does cause paging, then you have just incurred a hidden I/O cost that can be quite debilitating:   a too-large to-fit yet frequently-accessed group of pages, causing thrashing to occur.)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1036439]
[Eily]: choroba but does $name = $1 solve the issue ?
[Eily]: if so, most subs start with my (VARIABLES) = @_ anyway
[moritz]: for short subs, I sometimes let them work on $_[0] directly
[Eily]: moritz but that should be fine unless you modify them
[choroba]: if you modify them, Perl will crash with "Modification of a read-only value attempted" for $1
[choroba]: I can't replicate the situation, but I remember it was very hard to debug

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (11)
As of 2018-02-22 17:26 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (296 votes). Check out past polls.