Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re^5: Out of Memory when generating large matrix (space complexity)

by Anonymous Monk
on Mar 07, 2018 at 13:25 UTC ( #1210450=note: print w/replies, xml ) Need Help??

in reply to Re^4: Out of Memory when generating large matrix (space complexity)
in thread Out of Memory when generating large matrix

Well, I glanced "(space complexity)" in the title and thought there is a glimmer of hope for you, yet.

You have identified the problem area, but again deftly avoid seeing the light. Sorting (or deduplicating) is a problem with O(n log n) time complexity. If you have a hash function that successfully distributes the keys, you can cut down the problem and move some of the complexity into space domain.

Hashes are O(n) both in time and space complexity (list insertion). Streaming merge is O(1) in space complexity. Partial hashing is possible. Using a hash table of size k, you can modify the algorithm to achieve O(n log(n/k)) in time complexity, and O(k) in space complexity. The k scales well until you break out of the CPU caches, after which it scales rather poorly. I referenced another thread where someone run into a brick wall trying to hash just 36M elements. Sort|uniq proved to be greatly superior in that case.

So far, you have

  • veered the topic into discussion of algorithmic complexity, where no-one really asked for it.
  • misapplied the big O notation. Big O does not say whether one solution is faster than other. It tells you about how a problem scales.
  • made the incorrect statement that a hash based solution would scale better than merge sort. In practice, hashing does not scale beyond memory limits.
  • made some ludicrously inappropriate suggestion of using wc; this suggests you did not invest the cycles necessary to understand the problem, let alone offer a solution.
  • applied some technique (hashing) as a magic bullet, without the fundamental grasp of subject matter. This is Cargo Cult by definition.

By the way, I never argued that a hash count was unsuitable. By all means, ++$count{$key} if that works. But you chose to attack a broken clock, and forgot that a broken clock, too, is right two times a day.

  • Comment on Re^5: Out of Memory when generating large matrix (space complexity)
  • Download Code

Replies are listed 'Best First'.
Re^6: Out of Memory when generating large matrix (space complexity)
by LanX (Bishop) on Mar 07, 2018 at 14:07 UTC
    > glimmer of hope for you,

    Wow, you are so humble and your posts so clear and understandable.

    No wonder you prefer to post anonymously.

    (closed :)

    Cheers Rolf
    (addicted to the Perl Programming Language and ☆☆☆☆ :)
    Wikisyntax for the Monastery

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1210450]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2018-04-20 22:49 GMT
Find Nodes?
    Voting Booth?