Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

OpenOffice::OODoc. Slow access to a cell content with getCellValue

by pvaldes (Chaplain)
on Jun 01, 2016 at 14:27 UTC ( [id://1164682]=perlquestion: print w/replies, xml ) Need Help??

pvaldes has asked for the wisdom of the Perl Monks concerning the following question:

I'm playing with the module OpenOffice and trying to extract some info from a spreadsheet to a text file. This is my program so far:

#!/usr/bin/perl use strict; use warnings; use OpenOffice::OODoc; ## module version 2.125.3, from Debian Perl Group, installed with: ## apt install libopenoffice-oodoc-perl open (my $out, '>', "outfile.txt") or die $!; my $file = ooFile("myspreadsheet.ods"); my $content = odfText(file => $file, part => 'content') or die $!; # Note: "odfText can be used in place of ooDocument() # if the calling application is only text-focused # We aren't interested in formats or styles. Perldoc" ## Write to a file the content of the non empty cells ## in the interval B1:G2001 from "sheet2" to "sheet9" ## in the .ods libreoffice spreadsheet. foreach my $sheet (1...8){ foreach my $row (0...2000){ foreach my $col (1...6){ my $value = $content->getCellValue($sheet,$row,$col); next if $value eq ""; # next if cell is empty next if $value eq "Don't keep this"; # or if match some pattern print $out $value,"\n"} # print cell content to outfile print $out "\n";} # and a newline at the end of each row print "okay, next sheet\n-------\n"; }; close $out;

When I run the program it take hours to finish. Much more slow than doing it directly with the mouse. I have also a warning message about a undefined "" value (maybe the problem is that $value is not of type string and need to be converted each time first?).

I would appreciate if you can take a look to this draft and suggest how to make it faster, or maybe point to any bug or possible memory leak. Thank you very much.

  • Comment on OpenOffice::OODoc. Slow access to a cell content with getCellValue
  • Download Code

Replies are listed 'Best First'.
Re: OpenOffice::OODoc. Slow access to a cell content with getCellValue
by poj (Abbot) on Jun 01, 2016 at 16:43 UTC

    You could try using the getTableText method to fill an array.

    #!/usr/bin/perl use strict; use OpenOffice::OODoc; use Data::Dump 'pp'; my $doc = odfDocument(file => "myspreadsheet.ods"); open (my $out, '>', "outfile.txt") or die $!; foreach my $sheetno (0...$doc->getTableList-1){ my $table = $doc->getTable($sheetno,'normalize'); my ($rows,$cols) = $doc->getTableSize($table); my @text = $doc->getTableText($table); #pp @text; print "Sheet=$sheetno rows=$rows cols=$cols\n"; foreach my $rowno (0...$rows-1){ foreach my $colno (0...$cols-1){ my $value = $text[$rowno][$colno]; next if $value eq ''; print $out $value,"\n" } print $out "\n"; } };
    poj
      Much faster Poj, only 11 minutes running (against "infinite minutes"), but the outfile.txt was totally empty after this time. Something is wrong :-(

        I used this to create a simple test file

        #!/usr/bin/perl use strict; use OpenOffice::OODoc; my $doc = odfDocument( file => 'testspreadsheet.ods', create => 'spreadsheet'); $doc->expandTable(0,5,10); for my $r (0..4){ for my $c (0..9){ $doc->cellValue(0,$r,$c,"text at ".chr(65+$c).($r+1)); } } $doc->save;
        poj

        Update: It seems that there was problem with the memory when I run your script, Poj. This problem could or could not be related with the perl job.

        I'm running again the script with the -d option. When It hits the line

         my $table = $doc->getTable($sheetno,'normalize');

        The perl script uses the 99-100% of CPU and the 10% of memory. After some minutes the memory consume raises to 81-83% until finally the computer frozens and the process is automatically killed without finishing the job. This takes 6-11 min and explains why outfile.txt is empty. It seems that my spreadsheet file is a hard nut to open. I wonder why.

Re: OpenOffice::OODoc. Slow access to a cell content with getCellValue
by anonymized user 468275 (Curate) on Jun 01, 2016 at 19:17 UTC
    Yes, the bug is "I'm playing with the module OpenOffice and trying to extract some info from a spreadsheet to a text file". The correction is s/module//; and the bug is gone ;) Seriously though, Excel and Open office provide text format input and output precisely so that people do not have to do this! A better idea is to consider the text file you can obtain easily enough without programming in itself a temporary inconvenience for which support will be irrelevant later down the path to developing a proper system that does what your business actually needs.

    One world, one people

      Thanks for the opinion, but sometimes you simply need a more sophisticated approach. This is only a first part of a much more complex, non standard, cell selection problem.

      My past experience with openoffice macros ended in a blast of fire and an orgy of mutilated xmls beyond repair, therefore, this time I want Perl to save the day.

        There is likely another way to achieve the overall goal is what I am saying.

        One world, one people

Re: OpenOffice::OODoc. Slow access to a cell content with getCellValue
by pvaldes (Chaplain) on Jun 06, 2016 at 09:57 UTC

    I keep doing my little research, and it seems that problems with memory in this module aren't new. This is a similar scenery reported as a bug in 2015

    "Reading several odf files in a loop reusing the same variable exhausts all memory. Example:"

    # Fails #!/usr/bin/perl use OpenOffice::OODoc; $myfile = "test.odt"; while(1){my $var = odfDocument(file => $myfile)}

    Why this happens? Do you think that adding an undef $value after the "foreach my $col" loop in my script would make a difference?. I didn't notice any improvement

    This is interesting also, any experience with this other module?:

    "For applications that make intensive spreadsheet processing, a switch to ODF::lpOD (more efficient and user-friendly than OpenOffice::OODoc) should be considered"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1164682]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-24 21:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found