Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: ParseExcel Again

by Anonymous Monk
on Feb 13, 2013 at 10:36 UTC ( #1018516=note: print w/replies, xml ) Need Help??


in reply to Re: ParseExcel Again
in thread ParseExcel Again

Its normal code to parse .xls

$excelRowCounter=0; my $parser = Spreadsheet::ParseExcel->new( CellHandler => \&excelCellhandle +r, NotSetCell => 1, Parameters => "XXXXXXXXX" ); $parser->parse($currentFile); sub excelCellhandler { # my $workbook = $_[0]; # my $sheet_index = $_[1]; my $row = $_[2]; # my $col = $_[3]; my $cell = $_[4]; my $q; if($excelRowCounter == $row) { $q = $cell->unformatted(); $text .= $v; } else { if(length($text) > 2097152) { do something return } $excelRowCounter = $row; $q = $cell->unformatted(); $text .= $v; } }

Replies are listed 'Best First'.
Re^3: ParseExcel Again
by tmharish (Friar) on Feb 13, 2013 at 11:20 UTC

    If you are not happy with a CPAN module you will have to essentially re-write it ( or at least parts of it )

    You can get the code from CPAN or GitHub, find the chunks that you are using ( which might be a pain considering the number of sub-modules it has ) and then try to optimize it.

    Also that module does not look like it has been maintained in over two years, but you could try to get in touch with the maintainers.

    If you do manage to speed it up / improve it, you might want to submit a patch.

    Or you could just try and get the same data in CSV - is that possible?

      CSV is not an option. The module has so many dependecies that speeding it up or identifying functions which i dont require is very difficult

        "speeding it up or identifying functions which i dont require is very difficult"

        Consider profiling your code. Devel::NYTProf, this includes documentation/screen casts explaining how to use the module to locate problems and areas for optimization. See also Debugging and Optimization. Alternatively, invest in a system with more RAM, faster disks (SSD) and a better CPU.

        I suspect that what tmharish was suggesting was that you save your problematic spreadsheet as CSV or TSV (doing so uses an excel function so no new dependancies there) and then use a CSV module -- such as Parse-CSV 2.00 -- to extract the data you're looking for.


        If you didn't program your executable by toggling in binary, it wasn't really programming!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1018516]
help
Chatterbox?
[Corion]: 1nickt: Finding autobox in production would give me pause, yes
[LanX]: efficient survey
[MidLifeXis]: And under MINGW64_NT-6.1 MYHOST 2.6.0(0.304/5/3) 2016-09-09 09:46 x86_64 Msys there seem to be issues with escapes in external build tool calls.
[Corion]: I mean, it's a technical feat it achieves, but... why? ;)
[MidLifeXis]: And it also has the 0.14 version of the tarball in its manifest.
[LanX]: avoiding unreadable brackets
[MidLifeXis]: Although the previous one could be a b0rken PATH, I would need to dig for that.
[thezip]: I've got to go to meetings now. If anyone has further comments regarding Spreadsheet::XLSX deployment to Strawberry Perl 5.24.1, please /msg me -- thanks!
LanX has to go ... plans to crash with a car into a group of pythonistas while screaming "LARRY IS THE GREATEST"
LanX ... darn ... where is my car?

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (13)
As of 2017-03-23 17:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Should Pluto Get Its Planethood Back?



    Results (291 votes). Check out past polls.