|Perl: the Markov chain saw|
Efficient way to sum columns in a fileby sk (Curate)
|on Apr 13, 2005 at 04:51 UTC||Need Help??|
sk has asked for the
wisdom of the Perl Monks concerning the following question:
I was watching someone trying to find the sum of a particular column in a file. He was taking it a data-application (meant to work with large files) to calculate this sum.
It takes a while to setup the app as you have to read the entire file and give it variable names etc. So i wrote this one liner which worked great (saved a lot of time).
I could avoid the "cut" but i didn't see a huge advantage (if someone can give me good reasons to avoid cut that will be nice!)....
Please note that the file is pretty large (around 5 million rows and a few hundred columns)... Since it worked out well, that person asked me how to modify the code to make it work for 5 columns. I immediately used an array (return from a split /,/)and looped through the list to get the sum of the columns every time a new row is sent in! Little did I realize at the time of writing that this will have horrible performance... After letting it run for a few minutes I realized that looping many times (millions!) is not such a good idea (bad idea rather?)... Maybe just declaring 5 variables would have been better...
So my question is how would Monks handle such a problem?
Thanks all for your time!