Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re: Rewrite Program Using Arrays

by JavaFan (Canon)
on Mar 25, 2012 at 16:39 UTC ( #961507=note: print w/replies, xml ) Need Help??

in reply to Rewrite Program Using Arrays

So I want to make this faster.
Then don't sort. Really. The rest of your program is essentially linear. Sorting is N log N. That's the bottleneck of your program. Any speedup you'll do in the rest of the program are pointless if your data size increases: the sort will dominate.

Replies are listed 'Best First'.
Re^2: Rewrite Program Using Arrays
by mbethke (Hermit) on Mar 25, 2012 at 17:51 UTC

    If he needs his output sorted though? Certainly the built-in sort won't be slower than using some external sort later?

    Then again, the n in sort's O(n) is the number of word forms which doesn't increase linearly with corpus size. Some law for it can probably be derived from Zipf's Law, I suspect it's something like log(corpus_size). So the sort should take an average of log(n)*log(log(n)) over corpus size which is not that bad.

Re^2: Rewrite Program Using Arrays
by perl.j (Pilgrim) on Mar 25, 2012 at 20:38 UTC
    Thanks for the advice, but I just want to clear this up. I am not comfortable using the push and pop functions, so this is also going to be a learning experience. I understand that there are a lot of things I can do to speed the process, but I want to see what this will do still... The problem is, I don't know how to do it.

      There is no use for push/pop here, though. This is a pretty straightforward problem: counting the occurrences of words in a file. That means putting the words in a hash as the keys and incrementing the value when a word is seen. That's what you've done, so you're already using the best algorithm. You can gain some speed by cutting out any unnecessary assignments, as I showed with the benchmarks in my other post; but you certainly won't gain anything by using a less straightforward algorithm or by shuffling your words around through extra arrays. There really aren't "a lot of things you can do to speed the process." Building a hash and sorting on the values is as fast as it's gonna get, unless you want to rewrite it in C.

      Aaron B.
      My Woefully Neglected Blog, where I occasionally mention Perl.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://961507]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2018-05-24 16:58 GMT
Find Nodes?
    Voting Booth?