comment on

Well, you've all made some good points. :) Particularly that IO is the constraint.

A few thoughts...

Sometimes there is a better algorithm (as opposed to a better optimization). And that probably requires a better understanding of the core problem. For example, did the OP really need an exact tally, or would it have been adequate to sample and extrapolate? If so, what size sample would have been sufficient, and would it have been ok to take the first 1..s ints, or would it have been necessary to take s ints from random positions in the data set. We can't really answer that because we don't know enough about the data set.

Another consideration could be maintaining a count as the data is generated. If over 3GB of integers was generated over the course of hours, days, weeks, or months, the extra cost of maintaining a running tally might go un-noticed, whereas a 60-second sudden lag may be a problem.

And of course we don't know even if a sudden lag IS a problem. Perhaps the user runs this once, walks away to grab a coffee, and comes back to it later, never to run it again. Or perhaps it is something automated as a cron job for 4:27am. Maybe it's like my Windows computer where the world has already accepted that it takes 4 minutes to boot, and where shutting down may take even longer if some MS-pushed upgrade is waiting to finalize. Quite probably none of this matters to the person with the original question, and that being the case not only is my meditation premature micro-optimization, but any attempt to find a better alternative is still unwarranted.

Nevertheless, I found it a fun exercise. If I ventured down the wrong tangent, I'll live and learn. In the end the hour or so spent investigating was educational enough I can justify it. And I'm thankful for the input, reminders, and gentle slap-in-the-back-of-the-head from my friends here.

Dave

In reply to Re: "Just use a hash": An overworked mantra? by davido
in thread "Just use a hash": An overworked mantra? by davido

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


go ahead... be a heretic
	PerlMonks