Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re^3: XLSX read and dump

by talexb (Canon)
on Jan 18, 2012 at 17:57 UTC ( #948600=note: print w/replies, xml ) Need Help??

in reply to Re^2: XLSX read and dump
in thread XLSX read and dump

First of all, a meta-comment: you should wrap your errors in code tags to a) make that part of it easier to read and b) make it stand out from the rest of your post. Ideally, the errors would look like

    IO error: opening test.xlsx for read : No such file or directory at /System/Library/Perl/Extras/5.12/Archive/ +Zip/ line 546 Archive::Zip::Archive::read('Archive::Zip::Archive=HASH(0x7f87628288d8 +)', 'test.xlsx') called at /Library/Perl/5.12/Spreadsheet/ line 33 Spreadsheet::XLSX::new('Spreadsheet::XLSX', 'test.xlsx') called at line 6 Cannot open test.xlsx as Zip archive at /Library/Perl/5.12/Spreadsheet +/ line 33
I've added a few line breaks to make it easier to read.

Second, it looks like (and I'm guessing, because you haven't posted any code yet) you are accessing the spreadsheet while it's inside a zip file. To make it easier, can you just extract one of the spreadsheets and operate on that?

Let's try to solve one problem at a time (aka, "You've got to walk before you can run.")

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Replies are listed 'Best First'.
Re^4: XLSX read and dump
by furry_marmot (Pilgrim) on Jan 18, 2012 at 20:05 UTC

    FYI, XLSX files ARE zipped. If you look at it in an editor or hex viewer, the first two characters are PK, which indicates it's a ZIP file (from PKZIP, the forerunner of modern ZIP, where PK stands for Phil Katz, who invented it).

    If you run unzip (or pkunzip, etc.) on the file, it creates some directories that are full of XML files and RELS files (MS Office 2007+ relationship files). Any module working with an XLSX file would need to include something from the Archive::ZIP family.

    That said, I don't work on a Mac, but I would have assumed Spreadsheet::XLSX would have installed Archive::Zip::Archive as a dependency. Maybe the OP should try installing it manually.

      I believe the library is installed, but the file either does not exist, or is not a zip archive (i.e. not a valid xlsx file).
Re^4: XLSX read and dump
by chirp84 (Novice) on Jan 18, 2012 at 18:52 UTC
    The test.xlsx file isn't in a zip file. I thought it was going to create a new file called test.xlsx, but didn't. So then I tried renaming one of the files to test.xlsx and placed it in the same directory as the script below, named, thinking that it would read it. Either way, I still get the same error about a zip archive.
    use strict; use warnings; use Spreadsheet::XLSX; my $excel = Spreadsheet::XLSX -> new ('test.xlsx'); foreach my $sheet (@{$excel -> {Worksheet}}) { printf("Sheet: %s\n", $sheet->{Sheet1}); $sheet -> {B45} ||= $sheet -> {B37}; foreach my $row ($sheet -> {B37} .. $sheet -> {B45}) { $sheet -> {R45} ||= $sheet -> {C45}; foreach my $col ($sheet -> {C45} .. $sheet -> {R45}) +{ my $cell = $sheet -> {Cells} [$row] [$col]; if ($cell) { printf("( %s , %s ) => %s\n", $row, $col, +$cell -> {Val}); } } } }
      It looks like it just can't find your test.xlsx file. Try using the full path to your file, something like:
      my $excel = Spreadsheet::XLSX -> new ('/home/chirp84/test.xlsx');
        No dice. Returns same error but now includes full path to file.
      Are you trying to parse an actual, valid, existing xlsx file, or are you trying to create one? If you are reading/parsing a spreadsheet, use Spreadsheet::XLSX. If you are trying to create one, use Excel::Writer::XLSX. Your error message indicates that either your xlsx file does not exist, or is not a valid zip archive (and xlsx files are zip archives).
        I had the wrong file extension. Changing to .xlsx allowed program to run. I am trying to read certain cells in 100+ separate .xlsx documents and then dump those values into a single excel document (or really any format, i.e. .xls, xlsx, .csv) that I can then analyze the values. Is there further documentation other than that readily viewable on the CPAN site? Maybe the issue is I just don't know enough perl to tackle this. But it would be so useful so I keep pressing on. It's like I can see all the modules and their usefulness but I can't tie them together.
        The test.xlsx file isn't in a zip file. I thought it was going to create a new file called test.xlsx, but didn't.

      Sorry -- confusion. The 'new' in this context doesn't mean it's going to *create* a file. It's creating a new context in order read one of your files that already exists.

      Next, I'm not sure

      my $excel = Spreadsheet::XLSX -> new ('test.xlsx');
      is correct. I would have expected
      my $excel = Spreadsheet::XLSX->new ('test.xlsx');
      to be better. Does your script produce any warnings when you run it?

      You need to get to the point where you're able to open a file in your current directory before we can proceed.

      Alex / talexb / Toronto

      "Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

        Next, I'm not sure

        my $excel = Spreadsheet::XLSX -> new ('test.xlsx');

        is correct.

        It's not a common way of laying out code, but adding arbitrary whitespace between tokens in Perl is rarely disallowed. Whitespace is only really needed between tokens if they'd look like another token if the whitespace was missing.

        For example, given:

        use 5.010; my @foo = split /\|/, q {foo|bar|bar}; foreach my $x (@foo) { say $x; }

        ... there are only actually two pieces of required whitespace: between "use" and "5.010", and between "foreach" and "my". The code runs perfectly well if you strip out the rest of the whitespace:

        use 5.010;my@foo=split/\|/,q{foo|bar|bar};foreach my$x(@foo){say$x}

        ... though most people would consider the former to be more readable.

        Okay, the data file I renamed had the wrong extension. I had .xls, after I changed to .xlsx the script ran. All the data from excel printed into my terminal window in the format below:
        ( 1 , 1 ) => -230 ( 1 , 2 ) => -201.25 ( 1 , 3 ) => -172.5 ( 1 , 4 ) => -143.75
        and so on... Also, I changed the code to reflect exactly what the Spreadsheet::XLSX synopsis shows because my actual row and column names were causing errors. Now the question is: How can I print just the cells I want to a separate file?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://948600]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2018-05-25 17:43 GMT
Find Nodes?
    Voting Booth?