|Perl: the Markov chain saw|
Recently, I had to do analysis of data contained in some mainframe extract files, which I chose to do by loading the files into MS Excel with Perl. This node describes my experiences.
The files in question are generated periodically by a COBOL program on a mainframe. My tool of choice for this sort of analysis is usually Microsoft Excel. However, two complications prevented me from simply using text import in Excel.
The machine in question ran Windows 2000, and had Office XP installed. Thus, I had two choices for writing this program: Win32::OLE and Spreadsheet::WriteExcel. I first used the former, but was dissatisfied with the results, so I turned to the latter. The experience gave me some interesting insight into both modules.
Let's get the obvious differences out of the way: Win32::OLE can be used for much more than writing Excel spreadsheets, so it may be said that the comparison isn't a fair one. (Nonetheless, I think judging a swiss army knife solely on the sharpness of the blade without considering the screwdriver and file is a reasonable thing.) Second, Spreadsheet::WriteExcel can run on platforms other than Win32, which is one of its main strengths. However, the question I was interested in is, "Given that I have a choice of modules, are there reasons to prefer one over the other?" Which brings us to this node.
To use this module to write spreadsheets, you need a little background in VBA and the Excel object model. There is an excellent tutorial here in the monastery on how to use the module to interact with Excel. Since I had a good bit of prior experience with VBA (no snickering, please), it was pretty straightforward to write the code.
However, I ran into some difficulties. The first one, which is really an issue with all Perl code that uses Win32::OLE, is that the API just doesn't map nicely into Perl concepts. The code looks like Perl written by a VB programmer, because you still need VB-ish concepts like in and with to make the code compact. This isn't really a weakness, but just unpleasant. I like to write Perl like Perl.
The second difficulty was that the code ran very slowly. I looked in the Win2k task manager while the program was running, and Excel was taking about 70% of the CPU, while Perl had about 30%. My conjecture (though I have no hard data to back it up) is that the problem was my cell-at-a-time code. (That is, my code was setting one cell in the spreadsheet at a time, rather than an entire row or column.) Possibly more time working out the VBA and/or Win32 code could have squeezed more performance out of it. But the program was just a means to an end, and not an end in itself, so I couldn't justify spending a lot more time on it.
The last difficulty -- and I have no idea why this happened -- was that sometimes Excel had trouble opening the resulting file. If I double-clicked on the file in Explorer, Excel would come up, but never finish loading it. However, I could open the file with no problems using the File->Open dialog in Excel. I suspect there was some Excel "ghost" hanging around after the program ran, although I could not find it in the process list in the Windows task manager.
All in all, I had a running, though slow, program, that sometimes did weird things. After a brief conversation in the CB here, I decided to give Spreadsheet::WriteExcel a shot.
My first surprise was the module's dependencies: To get it running, I also had to install Parse::RecDescent, and later, when I had to write large spreadsheets, I found I also needed IO::Stringy and OLE::Storage_Lite.
On the other hand, I found the code much more "natural" to write in Perl. It didn't fit 100% with the Excel object model I already knew, but it wasn't too far. The module's documentation is very detailed and of high quality, so putting the code together was very direct. I had a working program (using parts of my previous effort) in a matter of minutes.
I was pleasantly surprised that the code ran much faster. I'm pretty sure that was partly the result of the fact that I could use arrays much more naturally, like writing a row instead of a cell at a time. My code "looked like Perl".
But this time I ran into other difficulties. The dates in the Excel spreadsheet were coming up as strings, rather than dates. Not a big deal in my particular case, but it meant I couldn't re-format them or do manipulations on them. I suspect I would have to format them as numbers (which is how Excel represents dates internally), but I didn't have the time or inclination to experiment. The second annoyance, albeit a very minor one, is that when told to close the file, Excel asked if I wanted to upgrade the format: Spreadsheet::WriteExcel is only up to Office 2000, and I was using Office XP.
Overall, though, my program ran faster, and looked better. So I chose to continue using the module.
The bottom line is that I was able to do what I wanted with either module: have a reusable way to take apart the mainframe files and analyze them with Excel. Both modules did a good job at that. But I will probably continue to use Spreadsheet::WriteExcel mainly because it "feels" more Perlish, and it was easier to get my code running faster.
In summmary, here is a list of the differences I found.
I will continue to use both modules, although for this particular case, Spreadsheet::WriteExcel turned out the better tool. If you need to write an Excel spreadsheet on Win32, give both a try; you may be as pleasantly surprised as I was.