Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Writing Excel spreadsheets with Perl

by VSarkiss (Monsignor)
on Dec 16, 2002 at 03:11 UTC ( #220111=perlmeditation: print w/ replies, xml ) Need Help??

Recently, I had to do analysis of data contained in some mainframe extract files, which I chose to do by loading the files into MS Excel with Perl. This node describes my experiences.

Background

The files in question are generated periodically by a COBOL program on a mainframe. My tool of choice for this sort of analysis is usually Microsoft Excel. However, two complications prevented me from simply using text import in Excel.

  • First, the files are fixed-width fields, and I would have to do this several times. The text import wizard won't save off the file format specification separately, so importing every new file would entail repeating the layout in the wizard. (I'm aware that MS Access can save format specifications for re-use, but using Access introduces other complications I'd rather avoid.)
  • Second, some of the fields use COBOL signed and implied-decimal format (e.g., PIC S9(08)V99). In such fields, the last character is a combination of the sign and the digit, so you have play tricks to get an "Excel number". For example, the string 1234} represents -123.40: the last } means both that the last digit is zero, and that the number is negative. Also note the decimal point is not explicit in the original string.
Writing a Perl program to break up the file and write a spreadsheet would achieve both of these goals: it would be re-usable, and it would be very easy to write code to digest the COBOL weirdness. (Hmm, I smell a golf challenge. ;-)

The machine in question ran Windows 2000, and had Office XP installed. Thus, I had two choices for writing this program: Win32::OLE and Spreadsheet::WriteExcel. I first used the former, but was dissatisfied with the results, so I turned to the latter. The experience gave me some interesting insight into both modules.

Let's get the obvious differences out of the way: Win32::OLE can be used for much more than writing Excel spreadsheets, so it may be said that the comparison isn't a fair one. (Nonetheless, I think judging a swiss army knife solely on the sharpness of the blade without considering the screwdriver and file is a reasonable thing.) Second, Spreadsheet::WriteExcel can run on platforms other than Win32, which is one of its main strengths. However, the question I was interested in is, "Given that I have a choice of modules, are there reasons to prefer one over the other?" Which brings us to this node.

Using Win32::OLE

To use this module to write spreadsheets, you need a little background in VBA and the Excel object model. There is an excellent tutorial here in the monastery on how to use the module to interact with Excel. Since I had a good bit of prior experience with VBA (no snickering, please), it was pretty straightforward to write the code.

However, I ran into some difficulties. The first one, which is really an issue with all Perl code that uses Win32::OLE, is that the API just doesn't map nicely into Perl concepts. The code looks like Perl written by a VB programmer, because you still need VB-ish concepts like in and with to make the code compact. This isn't really a weakness, but just unpleasant. I like to write Perl like Perl.

The second difficulty was that the code ran very slowly. I looked in the Win2k task manager while the program was running, and Excel was taking about 70% of the CPU, while Perl had about 30%. My conjecture (though I have no hard data to back it up) is that the problem was my cell-at-a-time code. (That is, my code was setting one cell in the spreadsheet at a time, rather than an entire row or column.) Possibly more time working out the VBA and/or Win32 code could have squeezed more performance out of it. But the program was just a means to an end, and not an end in itself, so I couldn't justify spending a lot more time on it.

The last difficulty -- and I have no idea why this happened -- was that sometimes Excel had trouble opening the resulting file. If I double-clicked on the file in Explorer, Excel would come up, but never finish loading it. However, I could open the file with no problems using the File->Open dialog in Excel. I suspect there was some Excel "ghost" hanging around after the program ran, although I could not find it in the process list in the Windows task manager.

All in all, I had a running, though slow, program, that sometimes did weird things. After a brief conversation in the CB here, I decided to give Spreadsheet::WriteExcel a shot.

Using Spreadsheet::WriteExcel

My first surprise was the module's dependencies: To get it running, I also had to install Parse::RecDescent, and later, when I had to write large spreadsheets, I found I also needed IO::Stringy and OLE::Storage_Lite.

On the other hand, I found the code much more "natural" to write in Perl. It didn't fit 100% with the Excel object model I already knew, but it wasn't too far. The module's documentation is very detailed and of high quality, so putting the code together was very direct. I had a working program (using parts of my previous effort) in a matter of minutes.

I was pleasantly surprised that the code ran much faster. I'm pretty sure that was partly the result of the fact that I could use arrays much more naturally, like writing a row instead of a cell at a time. My code "looked like Perl".

But this time I ran into other difficulties. The dates in the Excel spreadsheet were coming up as strings, rather than dates. Not a big deal in my particular case, but it meant I couldn't re-format them or do manipulations on them. I suspect I would have to format them as numbers (which is how Excel represents dates internally), but I didn't have the time or inclination to experiment. The second annoyance, albeit a very minor one, is that when told to close the file, Excel asked if I wanted to upgrade the format: Spreadsheet::WriteExcel is only up to Office 2000, and I was using Office XP.

Overall, though, my program ran faster, and looked better. So I chose to continue using the module.

Conclusion

The bottom line is that I was able to do what I wanted with either module: have a reusable way to take apart the mainframe files and analyze them with Excel. Both modules did a good job at that. But I will probably continue to use Spreadsheet::WriteExcel mainly because it "feels" more Perlish, and it was easier to get my code running faster.

In summmary, here is a list of the differences I found.

Criterion Win32::OLE Spreadsheet::WriteExcel
Prerequisites Win32 only, plus needs Excel installed Cross-platform, but dependent on more modules
Performance Depends on underlying COM code Very good (subjectively)
Ease of use Good, but depends on prior knowledge of COM Very good, mainly because of excellent documentation.

I will continue to use both modules, although for this particular case, Spreadsheet::WriteExcel turned out the better tool. If you need to write an Excel spreadsheet on Win32, give both a try; you may be as pleasantly surprised as I was.

Comment on Writing Excel spreadsheets with Perl
Select or Download Code
Re: Writing Excel spreadsheets with Perl
by hawtin (Prior) on Dec 16, 2002 at 08:52 UTC

    Excelent review of your experiences, Thanks

    One tiny niggle, the thine that starts Writing a Perl program to break up the file appears to be missing a <p> tag. It causes my (very old) browser to overprint

Re: Writing Excel spreadsheets with Perl
by Discipulus (Curate) on Dec 16, 2002 at 10:47 UTC
    very good and thanks..but the link an excellent tutorial point I don't know where... greetings lorenzo

      <oops> Fixed, thanks. </oops>

Re: Writing Excel spreadsheets with Perl
by Mr. Muskrat (Abbot) on Dec 16, 2002 at 14:24 UTC
    Very nice but I think you forgot one...

    CriterionWin32::OLESpreadsheet::WriteExcel
    SpeedVery slowVery fast

      Not really. That's supposed to be what the "Performance" row describes. Although SSWE was much faster in my case, I don't think the slowness of the program with Win32::OLE can be blamed on the module. I'm certain the speed depended much more on Excel XP (that's what I tried to convey by saying "depends on the underlying COM code"). I'm also guessing it had more to do with how I wrote the program.

        Ah. I generally don't mix performance and speed. A program can perform well (do a good job) while being slow.

        As far as the slowness being blamed on the module, it would probably depend on how the program that you are calling was written. And we all know how poorly written all M$ software is. ;)

Re: Writing Excel spreadsheets with Perl
by jmcnamara (Monsignor) on Dec 16, 2002 at 17:14 UTC

    This is a very fair assessment of both modules. ++

    In general I'd say that Win32::OLE's biggest drawback is the fact that it moves the programmer from Perl space to COM space. While that isn't a problem if you are comfortable with COM or VBA the monolithic API is potentially daunting for a programmer who primarily uses Perl.

    I feel that Win32::OLE is let down in this respect by a lack of documentation. A cookbook style document in the Pod would go a long way to getting people started.

    Notwithstanding these points, Win32::OLE is really a killer app. On a Windows system it gives you almost unlimited control over COM based applications.

    As such, the Spreadsheet::WriteExcel documentation has always recommended Win32::OLE as an alternative. Like all modules Spreadsheet::WriteExcel is only useful if it does what you want it to to. However, if you have requirements for charts, filters, macros or pivot tables then you are stuck. This is not the case with Win32::OLE. It is, and always will be, able to access virtually every Excel feature that the user can access.

    Spreadsheet::WriteExcel's strengths are that it is cross platform, heavily tested and comparatively fast and lightweight.

    --
    John.

Re: Writing Excel spreadsheets with Perl
by jmcnamara (Monsignor) on Dec 16, 2002 at 17:17 UTC

    The second difficulty was that the code ran very slowly. ... My conjecture (though I have no hard data to back it up) is that the problem was my cell-at-a-time code.

    This may well be the case. Writing data to Excel via Win32::OLE can be significantly faster if you write it in one go. Example from ActivePerl-Winfaq12:

    my $Range = $Sheet->Range("A2:C7"); $Range->{Value} = [['Delivered', 'En route', 'To be shipped'], [504, 102, 86], [670, 150, 174], [891, 261, 201], [1274, 471, 321], [1563, 536, 241]];

    --
    John.

      Aha! I wasn't aware that was possible. If I have an opportunity, I'll try a row-at-a-time style and report on the results.

      Thanks for the pointer to the FAQ, BTW.

      This solved my prob as i was dyin to find Win32::OLE based code to enter my values to the sheet bt always found Spreadsheet::WriteExcel. Its simple and fast...

      Thanks Dharma
Excel 2007 Support
by lecanardroti (Initiate) on Jun 24, 2009 at 20:15 UTC
    There is one advantage of Win32::OLE that has not been mentioned. It works with Excel 2007. This works because Win32::OLE makes use of the COM objects installed on the workstation itself. While this limits portability beyond Windows (and perhaps even on Windows for certain features), it has the advantage of not requiring an update to the module. As far as I can tell, none of the Spreadsheet::<module> modules work with Excel 2007. Perhaps this can be addressed in a future build, but currently it is necessary to use Win32::OLE if you have to deal with the newer version. The file format is different for Excel 2007 than for Excel 2003 and earlier. Microsoft even designated a new extension for it, *.xlsx (2007) instead of *.xls (2003 and earlier).

      Excel 2007 uses ZIP compressed XML, labelled Office Open XML. It roughly looks like the open standard ODF used in free software, but it seems Microsoft intentionally made its format incompatible with the ODF format.

      As both standards are based on XML, and both are more or less documented, you "simply" generate some XML, ZIP it, and rename it to *.xlsx. (Perhaps you have to sacrifice some chickens, burn some candles, and draw some blood pentagrams on that way. But you should be used to such procedures when you deal with XML and/or Microsoft software, so nothing special here. ;-)) The OpenOffice::OODoc package should at least be able to read and write the ODF format without hurting animals.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Hi, I am trying to find a perl module which can work with MS excel 2007. Forom your explaination and based on my study from CPAN i understnd that no perl module can be used to write in MS excel 2007. Is is thisbeacuse non -avaolability of proper perl modules.

        As the grandparent post said, you can still use Win32::OLE. Also, maybe XLSX can write XLSX files.

      Hi, the above information was very use full to me.I did a project where i need to generate a excel file. I used spreadsheet::WriteExcel, I am able to see complete data in the output file in Excel2002 version. Where as the client had MSExcel2007 for him, he was getting an error 'data lost'. And he is not able to see the complete data. After reading this I found that I should use Win32::OLE. I will go for this. I had a bad experience with the Microsoft products. No compatibility between different versions.

Reaped: Re: Writing Excel spreadsheets with Perl
by NodeReaper (Curate) on May 09, 2011 at 18:34 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://220111]
Approved by jryan
Front-paged by stefp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2014-08-22 06:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (148 votes), past polls