http://www.perlmonks.org?node_id=422147

Andre_br has asked for the wisdom of the Perl Monks concerning the following question:

Hello my friends,

I read the post about the reducing memory usage by avoiding the formating data, wich may be causing the crash iīm running into with tables above 5 thousand records. But Iīm having trouble in fitting the code into mine. Can anyone give me a hint on this? Do I have to replace the names of the variables of the sub cell_handler? (wich is the recipe of the post I mentioned).

#!/usr/bin/perl -w use CGI qw( :standard); use Spreadsheet::ParseExcel; # The new objects my $q = new CGI; my $oExcel = Spreadsheet::ParseExcel->new( CellHandler => \&cell_handl +er, NotSetCell => 1 ); # McNamara's alternative cell handler sub cell_handler { my $workbook = $_[0]; my $sheet_index = $_[1]; my $row = $_[2]; my $col = $_[3]; my $cell = $_[4]; print $cell->{Val} , "<br>"; # ?? I keep this line? Before that I d +id the processing below... Do I have to move everything up here??? An +d, all these variables have other names below; do I have to replace t +he names or these are just internal names for this odd construction t +o work? (this seems the same construction of the upload_hook I never +understood...) } # The header print $q->header( -type => "text/html"); # Starting the cronometer $start = (times) [0]; # The reading # my $file = "z4kjasdf.xls"; my $file = "acervo15k.xls"; # my $file = "teste1.xls"; my $oBook = $oExcel->Parse($file); my($iR, $iC, $oWkS, $oWkC); my @thisrow; # Just the first worksheet... $oWkS = $oBook->{Worksheet}[0]; for ( $iR = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS- +>{MaxRow}; $iR++) { @thisrow = (); for ( $iC = $oWkS->{MinCol}; defined $oWkS->{MaxCol} && $iC <= $oWk +S->{MaxCol}; $iC++) { $oWkC = $oWkS->{Cells}[$iR][$iC]; if ($oWkC) { push (@thisrow, $oWkC->{Val}); } } my ($col1, $col2, $col3, $col4, @othercols) = @thisrow; print "$col1 <br>"; } # Ending the cronometer $end = (times) [0]; print "<b>$iR rows read in " , $end-$start , " seconds";
Thanks a lot!

André

Replies are listed 'Best First'.
Re: Spreadsheet::ParseExcel vs alternative Cell Handler
by jmcnamara (Monsignor) on Jan 14, 2005 at 09:19 UTC

    When you supply your own cell handler you don't need to iterate through the workbook sheet by sheet and cell by cell.

    The iteration will occur automatically when you call the Parse() method. As such you don't need any of the code below:

    my $oBook = $oExcel->Parse($file);

    I can post a more complete example if you define a little more clearly what you are trying to do. It looks like you are trying to extract the cells in the first column of the first worksheet and print them as in a Html format but perhaps you can go into a little more detail.

    --
    John.

      Hey John,

      Thanks for the help. What I am trying to do is just use Perl to convert a .xls table to a tab-delimited .txt file. This last one will suffer many inspections and processings - because of that I need the conversion, in order to avoid the weight of the .xls file all along the data analysis. At the end, having appended a last column "1" or "0" to each line of the tsv .txt, according to if itīs data is approved or not, the "1" lines are imported to the mysql database. Then, I use your Spreadsheet writter to give back the user the processed file with the error messages (when you implement the writte_comment theyīll go that way).

      Well, thatīs the whole story. For the conversion, using only your cell handler, I would have to identify wich line and wich column is each cell from, in order to record it with a "\t" after or a "\n" if it's the last column of that row. It seems the code I was using before would manage to do that, if I put it inside your cell-handler, wouldnīt it? Or there is another way of doing it? Or is there another simpler way?

      Thanks a lot, John.

      André

        PS: Itīs important that I split the conversion and the processing. I know I could do the inspection right away, but to split the process and give some information to the user like "file uploaded and converted" is important in this case, before I do the inspection (wish will happen at his command).
Re: Spreadsheet::ParseExcel vs alternative Cell Handler
by jmcnamara (Monsignor) on Jan 20, 2005 at 16:54 UTC

    What I am trying to do is just use Perl to convert a .xls table to a tab-delimited .txt file.

    Writing a converter that uses the cell_handler() callback is trickier than writing a converter based on the standard interface because you have to keep track of your position in the workbook yourself.

    Here is a sample xls2tab converter.

    #!/usr/bin/perl -w use strict; use Spreadsheet::ParseExcel; use Data::Dumper; my $parse_excel = new Spreadsheet::ParseExcel(CellHandler => \&cell_ha +ndler, NotSetCell => 1); my $prev_index = -1; my $prev_row = 0; my $prev_col = 0; my $workbook = $parse_excel->Parse('book1.xls'); # Define our own cell handler to reduce S::PE's memory use. # This function will be called each time a cell is encountered. # However, it will ignore blank cells so we have to keep track # of our location in the worksheet so that we can pad out blank # cells and rows. # sub cell_handler { my $workbook = $_[0]; my $sheet_index = $_[1]; my $row = $_[2]; my $col = $_[3]; my $cell = $_[4]; # Only process the first worksheet if ($sheet_index > 0) { $workbook->ParseAbort(1); return; } # Reset the col counter between rows $prev_col = 0 if $row != $prev_row; # Add tabs between fields and newlines between rows. Also pad # any missing rows or columns. # print "\n" for $prev_row +1 .. $row; print "\t" for $prev_col +1 .. $col; # Print the formatted value of the cell print $cell->{_Value}; # Keep track of where we are $prev_row = $row; $prev_col = $col; }

    --
    John.

      Hey John,

      Thanks a lot for the help. Itīs working just perfect now. The only complain is not mine. Itīs Perlīs, that keeps saying all the time:

      "Variable "$prev_col" will not stay shared at methods.cgi line 524. Variable "$prev_row" will not stay shared at methods.cgi line 524".

      This .cgi is an auxiliar script where Iīve put our code. I call it, as usual, with "require methods.cgi" and call the sub parse_xls() with two arguments - the names of the $infile and $outfile. The conversion is running smoothly, (even with the 15k lines file it was crashing before your cell handler! Great!) but Perl is now complaining with this message every time I run the code. I thought putting a "return 1" would calm down Perl, I say, tell her I am not worried about these variables staying shared, but it didnīt change anything.

      Any ideas?

      Thanks!

      André

        This is one of Perl's stranger messages, but it is a common one. I assume one of the following is true
        • You are requireing this code from within the body of a subroutine in your main code
        • You are running it with mod_perl under Apache::Registry
        The result is that you are declaring a nested subroutine within another one, and the inner sub accesses a lexical variable declared in the outer one. This is a bad thing, because Perl will not be able to figure out which "instance" of that lexical variable the inner sub should be working with when you call it. Solutions to this problem are explained here.

        Checking out perldoc perldiag can be really handy for these. As you don't have 524 lines of code here, I can't really be sure what it's referring to, but I find the diagnostic info to be incredibly useful. So useful that our coding standards say "use diagnostics" must be in our perl code, up with "use strict" and "use warnings".

      Hi John, I am new in Unix and I am trying to convert xls to tab delimited file. I have installed SpreadSheet ParseExcel but I don`t have any utility to work accordingly. Please suggest me at pushyamitra520@gmail.com Thanks very much.