Re^3: XLSX read and dump

First of all, a meta-comment: you should wrap your errors in code tags to a) make that part of it easier to read and b) make it stand out from the rest of your post. Ideally, the errors would look like

IO error: opening test.xlsx for read :
No such file or directory at /System/Library/Perl/Extras/5.12/Archive/
+Zip/Archive.pm line 546
Archive::Zip::Archive::read('Archive::Zip::Archive=HASH(0x7f87628288d8
+)', 'test.xlsx')
called at /Library/Perl/5.12/Spreadsheet/XLSX.pm line 33 
Spreadsheet::XLSX::new('Spreadsheet::XLSX', 'test.xlsx') 
called at parse.pl line 6
Cannot open test.xlsx as Zip archive at /Library/Perl/5.12/Spreadsheet
+/XLSX.pm line 33
[download]

I've added a few line breaks to make it easier to read.

Second, it looks like (and I'm guessing, because you haven't posted any code yet) you are accessing the spreadsheet while it's inside a zip file. To make it easier, can you just extract one of the spreadsheets and operate on that?

Let's try to solve one problem at a time (aka, "You've got to walk before you can run.")

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Comment on Re^3: XLSX read and dump Select or Download Code

Replies are listed 'Best First'.
Re^4: XLSX read and dump by furry_marmot (Pilgrim) on Jan 18, 2012 at 20:05 UTC
FYI, XLSX files ARE zipped. If you look at it in an editor or hex viewer, the first two characters are PK, which indicates it's a ZIP file (from PKZIP, the forerunner of modern ZIP, where PK stands for Phil Katz, who invented it). If you run `unzip` (or `pkunzip`, etc.) on the file, it creates some directories that are full of XML files and RELS files (MS Office 2007+ relationship files). Any module working with an XLSX file would need to include something from the `Archive::ZIP` family. That said, I don't work on a Mac, but I would have assumed `Spreadsheet::XLSX` would have installed `Archive::Zip::Archive` as a dependency. Maybe the OP should try installing it manually. --marmot	[reply] [d/l] [select]
Re^5: XLSX read and dump by runrig (Abbot) on Jan 18, 2012 at 20:37 UTC
I believe the library is installed, but the file either does not exist, or is not a zip archive (i.e. not a valid xlsx file).	[reply]
Re^4: XLSX read and dump by chirp84 (Novice) on Jan 18, 2012 at 18:52 UTC
The test.xlsx file isn't in a zip file. I thought it was going to create a new file called test.xlsx, but didn't. So then I tried renaming one of the files to test.xlsx and placed it in the same directory as the script below, named parse.pl, thinking that it would read it. Either way, I still get the same error about a zip archive. `use strict; use warnings; use Spreadsheet::XLSX; my $excel = Spreadsheet::XLSX -> new ('test.xlsx'); foreach my $sheet (@{$excel -> {Worksheet}}) { printf("Sheet: %s\n", $sheet->{Sheet1}); $sheet -> {B45} \|\|= $sheet -> {B37}; foreach my $row ($sheet -> {B37} .. $sheet -> {B45}) { $sheet -> {R45} \|\|= $sheet -> {C45}; foreach my $col ($sheet -> {C45} .. $sheet -> {R45}) +{ my $cell = $sheet -> {Cells} [$row] [$col]; if ($cell) { printf("( %s , %s ) => %s\n", $row, $col, +$cell -> {Val}); } } } }` [download]	[reply] [d/l]
Re^5: XLSX read and dump by toolic (Bishop) on Jan 18, 2012 at 19:07 UTC
It looks like it just can't find your test.xlsx file. Try using the full path to your file, something like: `my $excel = Spreadsheet::XLSX -> new ('/home/chirp84/test.xlsx');` [download]	[reply] [d/l]
Re^6: XLSX read and dump by chirp84 (Novice) on Jan 18, 2012 at 19:17 UTC
No dice. Returns same error but now includes full path to file.	[reply]
Re^5: XLSX read and dump by runrig (Abbot) on Jan 18, 2012 at 20:36 UTC
Are you trying to parse an actual, valid, existing xlsx file, or are you trying to create one? If you are reading/parsing a spreadsheet, use Spreadsheet::XLSX. If you are trying to create one, use Excel::Writer::XLSX. Your error message indicates that either your xlsx file does not exist, or is not a valid zip archive (and xlsx files are zip archives).	[reply]
Re^6: XLSX read and dump by chirp84 (Novice) on Jan 18, 2012 at 21:02 UTC
I had the wrong file extension. Changing to .xlsx allowed program to run. I am trying to read certain cells in 100+ separate .xlsx documents and then dump those values into a single excel document (or really any format, i.e. .xls, xlsx, .csv) that I can then analyze the values. Is there further documentation other than that readily viewable on the CPAN site? Maybe the issue is I just don't know enough perl to tackle this. But it would be so useful so I keep pressing on. It's like I can see all the modules and their usefulness but I can't tie them together.	[reply]
Re^5: XLSX read and dump by talexb (Chancellor) on Jan 18, 2012 at 19:48 UTC
The test.xlsx file isn't in a zip file. I thought it was going to create a new file called test.xlsx, but didn't. Sorry -- confusion. The 'new' in this context doesn't mean it's going to create a file. It's creating a new context in order read one of your files that already exists. Next, I'm not sure `my $excel = Spreadsheet::XLSX -> new ('test.xlsx');` [download] is correct. I would have expected `my $excel = Spreadsheet::XLSX->new ('test.xlsx');` [download] to be better. Does your script produce any warnings when you run it? You need to get to the point where you're able to open a file in your current directory before we can proceed. Alex / talexb / Toronto "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds	[reply] [d/l] [select]
Re^6: XLSX read and dump by tobyink (Canon) on Jan 18, 2012 at 21:30 UTC
Next, I'm not sure `my $excel = Spreadsheet::XLSX -> new ('test.xlsx');` is correct. It's not a common way of laying out code, but adding arbitrary whitespace between tokens in Perl is rarely disallowed. Whitespace is only really needed between tokens if they'd look like another token if the whitespace was missing. For example, given: `use 5.010; my @foo = split /\\|/, q {foo\|bar\|bar}; foreach my $x (@foo) { say $x; }` [download] ... there are only actually two pieces of required whitespace: between "use" and "5.010", and between "foreach" and "my". The code runs perfectly well if you strip out the rest of the whitespace: `use 5.010;my@foo=split/\\|/,q{foo\|bar\|bar};foreach my$x(@foo){say$x}` [download] ... though most people would consider the former to be more readable.	[reply] [d/l] [select]
Re^6: XLSX read and dump by chirp84 (Novice) on Jan 18, 2012 at 20:11 UTC
Okay, the data file I renamed had the wrong extension. I had .xls, after I changed to .xlsx the script ran. All the data from excel printed into my terminal window in the format below: `( 1 , 1 ) => -230 ( 1 , 2 ) => -201.25 ( 1 , 3 ) => -172.5 ( 1 , 4 ) => -143.75` [download] and so on... Also, I changed the code to reflect exactly what the Spreadsheet::XLSX synopsis shows because my actual row and column names were causing errors. Now the question is: How can I print just the cells I want to a separate file?	[reply] [d/l]
Re^7: XLSX read and dump by talexb (Chancellor) on Jan 18, 2012 at 20:28 UTC


Perl-Sensitive Sunglasses
	PerlMonks