|Perl: the Markov chain saw|
Confirming what we already knewby AssFace (Pilgrim)
|on Mar 05, 2003 at 02:44 UTC||Need Help??|
I have been programming a genetic analysis tool to run on stock data.
Perl has served well in that it is easy to write very quickly - and for 90% of what I do it is plenty fast. Whether the code executes in 1 second or 10 seconds doesn't really matter to me too much if I'm only going to run it 1 to 5 times a day.
But this particular code does kind of a lot of things, so it took some time to get through it all. I need to execute this at least once a day, so it needs to finish all of its work in under 20 hours or so (so that there are 4 hours left to run many other programs on the data that is returned).
The Perl code, depending on what machine I was on, was taking at best ~196 seconds to run one dataset (one stock ticker). I need to run over 2000 stocks through it everyday, so that was going to take well over a day on a single machine. I could easily get around it with clusters, but I still wanted something a little faster.
So I rewrote it all in C, here are the stats that I saw after doing so:
(keep in mind that these are on two different machines, and also the Perl code writes out 5 short lines of text to a file at the very end of each stock analysis)
The Perl version was run on a P4 2G with half a gig of RAM and an IDE HD, running Win2K and Active State 5.6.1 Perl. The C version was compiled and run on an Athlon 1G with half a gig of RAM and an IDE HD, running FreeBSD 4.6STABLE, using gcc version 2.95.4.
On the P4 2G, the perl code ran through one ticker in ~196 seconds.
The C code, with no optimizations, running on the Athlon 1G ran in ~2.8 seconds. Then using optimizations -O2 and -O3 it brought it down to ~1.4seconds.
So on a slower machine, it runs about 140 times faster.
The Perl file is around 48K in size, with 959 lines (100-200 are just comments). The C file is 768 lines (50-100 lines are just comments) and the compiled code is 11K with no optimizations, and 9K with optimizations.
I can now run all the stocks on one machine in under an hour (it will be run on an Athlon XP 2.1G with 256M RAM) - to that will allow me to run many more variations of the tests on the machine (will likely still be using a cluster) as well as neural net analysis among other things.
I don't think the writing to a file at the end of the Perl code that is missing from the C code accounts for the 190+ second difference. At most it might add one more second to the C code, but even that is questionable.
I'm curious how Java would compare since it would prove much much easier to write with the String objet to work with. 1.3 is faster, but 1.4 has RegEx in there.
I also still haven't tested it out under Linux with the newer 3.2 gcc or the intel compiler (not sure how well -if at all - the icc will help on an Athlon XP).
The code itself is pretty small and doesn't take that much in terms of resources.
It reads in rows of data (trading days) of a stock. Then iterates over it all thousands of times evolving how it analyzes the data. Then outputs the results (the C out to stdout and the perl out to a file).
It will likely end up being done in Perl, feeding in which ticker to run into the C code, and then passing it off to the C code. I don't think the speed in a Perl for loop will slow it down to much.
So the point of all that being that Perl is excellent for developing, is easy to write fast, and ports over to C fairly easily. It is plenty fast for the bulk of what you want to do, but for more intense things, then you will want C of course.