Spreadsheet::ParseExcel vs alternative Cell Handler

Andre_br has asked for the wisdom of the Perl Monks concerning the following question:

Hello my friends,

I read the post about the reducing memory usage by avoiding the formating data, wich may be causing the crash i´m running into with tables above 5 thousand records. But I´m having trouble in fitting the code into mine. Can anyone give me a hint on this? Do I have to replace the names of the variables of the sub cell_handler? (wich is the recipe of the post I mentioned).

#!/usr/bin/perl -w

use CGI qw( :standard);

use Spreadsheet::ParseExcel;


# The new objects

my $q = new CGI;

my $oExcel = Spreadsheet::ParseExcel->new( CellHandler => \&cell_handl
+er,
                                           NotSetCell => 1     );



# McNamara's alternative cell handler

sub cell_handler {

   my $workbook    = $_[0];
   my $sheet_index = $_[1];
   my $row         = $_[2];
   my $col         = $_[3];
   my $cell        = $_[4];
   
   print $cell->{Val} , "<br>"; # ?? I keep this line? Before that I d
+id the processing below... Do I have to move everything up here??? An
+d, all these variables have other names below; do I have to replace t
+he names or these are just internal names for this odd construction t
+o work? (this seems the same construction of the upload_hook I never 
+understood...)

}




# The header

print $q->header( -type => "text/html");



     # Starting the cronometer

     $start = (times) [0];


# The reading

# my $file = "z4kjasdf.xls";
 my $file = "acervo15k.xls";
# my $file = "teste1.xls";


my $oBook = $oExcel->Parse($file);

my($iR, $iC, $oWkS, $oWkC);

my @thisrow;


# Just the first worksheet...

$oWkS = $oBook->{Worksheet}[0];


for ( $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS-
+>{MaxRow}; $iR++) {

   @thisrow = ();   

   for ( $iC = $oWkS->{MinCol}; defined $oWkS->{MaxCol} && $iC <= $oWk
+S->{MaxCol}; $iC++) {

      $oWkC = $oWkS->{Cells}[$iR][$iC];
      
       if ($oWkC) { 

          push (@thisrow, $oWkC->{Val}); 
          
       }


   }

   my ($col1, $col2, $col3, $col4, @othercols) = @thisrow;
   
   print "$col1 <br>";

}  



     # Ending the cronometer

     $end = (times) [0];



print "<b>$iR rows read in " , $end-$start , " seconds";
[download]

Thanks a lot!

André

Comment on Spreadsheet::ParseExcel vs alternative Cell Handler Download Code

Replies are listed 'Best First'.
Re: Spreadsheet::ParseExcel vs alternative Cell Handler by jmcnamara (Monsignor) on Jan 14, 2005 at 09:19 UTC
When you supply your own cell handler you don't need to iterate through the workbook sheet by sheet and cell by cell. The iteration will occur automatically when you call the `Parse()` method. As such you don't need any of the code below: `my $oBook = $oExcel->Parse($file);` [download] I can post a more complete example if you define a little more clearly what you are trying to do. It looks like you are trying to extract the cells in the first column of the first worksheet and print them as in a Html format but perhaps you can go into a little more detail. -- John.	[reply] [d/l]
Re^2: Spreadsheet::ParseExcel vs alternative Cell Handler by Andre_br (Pilgrim) on Jan 15, 2005 at 16:51 UTC
Hey John, Thanks for the help. What I am trying to do is just use Perl to convert a .xls table to a tab-delimited .txt file. This last one will suffer many inspections and processings - because of that I need the conversion, in order to avoid the weight of the .xls file all along the data analysis. At the end, having appended a last column "1" or "0" to each line of the tsv .txt, according to if it´s data is approved or not, the "1" lines are imported to the mysql database. Then, I use your Spreadsheet writter to give back the user the processed file with the error messages (when you implement the writte_comment they´ll go that way). Well, that´s the whole story. For the conversion, using only your cell handler, I would have to identify wich line and wich column is each cell from, in order to record it with a "\t" after or a "\n" if it's the last column of that row. It seems the code I was using before would manage to do that, if I put it inside your cell-handler, wouldn´t it? Or there is another way of doing it? Or is there another simpler way? Thanks a lot, John. André	[reply]
Re^3: Spreadsheet::ParseExcel vs alternative Cell Handler by Andre_br (Pilgrim) on Jan 17, 2005 at 15:37 UTC
PS: It´s important that I split the conversion and the processing. I know I could do the inspection right away, but to split the process and give some information to the user like "file uploaded and converted" is important in this case, before I do the inspection (wish will happen at his command).	[reply]
Re: Spreadsheet::ParseExcel vs alternative Cell Handler by jmcnamara (Monsignor) on Jan 20, 2005 at 16:54 UTC
What I am trying to do is just use Perl to convert a .xls table to a tab-delimited .txt file. Writing a converter that uses the `cell_handler()` callback is trickier than writing a converter based on the standard interface because you have to keep track of your position in the workbook yourself. Here is a sample `xls2tab` converter. #!/usr/bin/perl -w use strict; use Spreadsheet::ParseExcel; use Data::Dumper; my $parse_excel = new Spreadsheet::ParseExcel(CellHandler => \&cell_ha +ndler, NotSetCell => 1); my $prev_index = -1; my $prev_row = 0; my $prev_col = 0; my $workbook = $parse_excel->Parse('book1.xls'); # Define our own cell handler to reduce S::PE's memory use. # This function will be called each time a cell is encountered. # However, it will ignore blank cells so we have to keep track # of our location in the worksheet so that we can pad out blank # cells and rows. # sub cell_handler { my $workbook = $_[0]; my $sheet_index = $_[1]; my $row = $_[2]; my $col = $_[3]; my $cell = $_[4]; # Only process the first worksheet if ($sheet_index > 0) { $workbook->ParseAbort(1); return; } # Reset the col counter between rows $prev_col = 0 if $row != $prev_row; # Add tabs between fields and newlines between rows. Also pad # any missing rows or columns. # print "\n" for $prev_row +1 .. $row; print "\t" for $prev_col +1 .. $col; # Print the formatted value of the cell print $cell->{_Value}; # Keep track of where we are $prev_row = $row; $prev_col = $col; } [download] -- John.	[reply] [d/l]
Perfect now! by Andre_br (Pilgrim) on Jan 22, 2005 at 19:35 UTC
Hey John, Thanks a lot for the help. It´s working just perfect now. The only complain is not mine. It´s Perl´s, that keeps saying all the time: "Variable "$prev_col" will not stay shared at methods.cgi line 524. Variable "$prev_row" will not stay shared at methods.cgi line 524". This .cgi is an auxiliar script where I´ve put our code. I call it, as usual, with "require methods.cgi" and call the sub parse_xls() with two arguments - the names of the $infile and $outfile. The conversion is running smoothly, (even with the 15k lines file it was crashing before your cell handler! Great!) but Perl is now complaining with this message every time I run the code. I thought putting a "return 1" would calm down Perl, I say, tell her I am not worried about these variables staying shared, but it didn´t change anything. Any ideas? Thanks! André	[reply]
Re: Perfect now! by Errto (Vicar) on Jan 22, 2005 at 21:33 UTC
This is one of Perl's stranger messages, but it is a common one. I assume one of the following is true You are `require`ing this code from within the body of a subroutine in your main code You are running it with mod_perl under Apache::Registry The result is that you are declaring a nested subroutine within another one, and the inner sub accesses a lexical variable declared in the outer one. This is a bad thing, because Perl will not be able to figure out which "instance" of that lexical variable the inner sub should be working with when you call it. Solutions to this problem are explained here.	[reply] [d/l]
Re^2: Perfect now! by Anonymous Monk on Jan 27, 2005 at 18:55 UTC
Re: Perfect now! by Tanktalus (Canon) on Jan 22, 2005 at 20:31 UTC
Checking out perldoc perldiag can be really handy for these. As you don't have 524 lines of code here, I can't really be sure what it's referring to, but I find the diagnostic info to be incredibly useful. So useful that our coding standards say "use diagnostics" must be in our perl code, up with "use strict" and "use warnings".	[reply]
Re^2: Spreadsheet::ParseExcel vs alternative Cell Handler by Anonymous Monk on Apr 09, 2007 at 19:54 UTC
Hi John, I am new in Unix and I am trying to convert xls to tab delimited file. I have installed SpreadSheet ParseExcel but I don`t have any utility to work accordingly. Please suggest me at pushyamitra520@gmail.com Thanks very much.	[reply]

Back to Seekers of Perl Wisdom