Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

msft xlsx2txt help please!

by david_lyon (Sexton)
on Aug 22, 2010 at 01:08 UTC ( #856531=perlquestion: print w/ replies, xml ) Need Help??
david_lyon has asked for the wisdom of the Perl Monks concerning the following question:

Bows to the Perl Monks

I have been looking for a way to parse msft xlsx to tab delimited text file on a linux machine using perl.

I have tried numerous scripts on the web which to date none of them reliably parses out xlsx from what I could see. Can someone post some perl code that can convert xlsx and produce a tab delimted text file.

Thank you so much...Bows

Comment on msft xlsx2txt help please!
Re: msft xlsx2txt help please!
by Anonymous Monk on Aug 22, 2010 at 01:12 UTC
Re: msft xlsx2txt help please!
by sflitman (Hermit) on Aug 22, 2010 at 05:43 UTC
    I took this from Spreadsheet::XLSX's perldoc. Make sure modules and all dependencies are installed through CPAN, and I found perl 5.10 was needed due to a dependency on Weak References in Spreadsheet::ParseExcel.
    #!/usr/bin/perl # SSF 082110 - for perlmonks - convert XLSX to tab-delimited use warnings; use strict; use Spreadsheet::XLSX; use Text::Iconv; my ($file,$sheetindex)=@ARGV; die "Usage: $0 file.xlsx [sheetindex]\n" unless $file; $sheetindex||=0; my $converter=Text::Iconv->new("utf-8","windows-1251"); my $excel=Spreadsheet::XLSX->new($file,$converter); my @sheets=@{$excel->{Worksheet}}; die "Bad sheet index: $sheetindex\n" if $sheetindex<0 or $sheetindex>$ +#sheets; my $sheet=$sheets[$sheetindex]; $sheet->{MaxRow}||=$sheet->{MinRow}; for my $row ($sheet->{MinRow} .. $sheet->{MaxRow}) { $sheet->{MaxCol}||=$sheet->{MinCol}; for my $col ($sheet->{MinCol}..$sheet->{MaxCol}) { my $cell=$sheet->{Cells}[$row][$col]; if ($cell) { print $cell->{Val}; } print "\t" if $col<$sheet->{MaxCol}; } print "\n"; } exit;
    Note that it only outputs one sheet, defaulting to the first. I've noticed that Excel's blank book has three sheets, so it is already three-dimensional. This script could use more error checking, but it works on a Linux system with the caveats listed above.

    HTH,

    SSF

      Bows to The Monks....

      Thanks for everyones help

      Thank you sflitman for posting the code.....incredible, works beautifully

      Bows

Re: msft xlsx2txt help please!
by sundialsvc4 (Monsignor) on Aug 23, 2010 at 13:14 UTC

    I would say, as a categorical statement, that you always want to find, and to use, a CPAN module of recent vintage to do these things.   Don’t pore through the module and cabbage code from it:   use the module exactly as it is.

    Of course, we know that Microsoft is easy to deal with.   We know that they never create file-formats that are secretive or difficult to understand.   (Oh, no-o-o, they would never do such a thing!)   Therefore, when you find a CPAN contribution that has undertaken the chore of understanding that format ... that is the way that you should do it.

    The same is true for output:   there are CPAN modules for tab-delimited text files, too.   Granted, tab-delimited is a comparatively trivial format to deal with, but not entirely.   IMHO, even a slight amount of hair-pulling to do such things is an unjustifiable abuse of precious follicles.

    Actum Ne Agas:   “Do Not Do A Thing Already Done.”

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://856531]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2014-08-20 22:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (124 votes), past polls