Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I think one of the reasons Spreadsheet::XLSX is so slow, is that it doesn't use a proper XML parser, but parses the workbook(s) using regular expressions. And over that, it uses:

use Archive::Zip; use Spreadsheet::XLSX::Fmt2007; use Data::Dumper; use Spreadsheet::ParseExcel;

to be Spreadsheet::ParseExcel compatible (which it really is not.

In most Spreadsheet modules, the whole spreadsheet (file) is read into memory, as there are several formats to be parsed before one can get to the actual data (ZIP, binary, ...). If the spreadsheet would be readable directly from file (like CSV, if you want to call that a spreadsheet), parsing could be a lot faster.

If someone would (re)write this module using a proper (fast) XML parser, preferably with the option to select whatever (working) XML parser is installed, that would really help this module. I really mean option here, as making the module require XML::libXML would mean its death, as XML::libXML depends on libxml2, which might prove very hard to port on some non-standardish systems. So the module should choose between XML::libXML, XML::Parser, XML::Parser::Lite, XML::Simple, or XML::Twig (and even those might he depending on each other).


Enjoy, Have FUN! H.Merijn

In reply to Re: Speeding up Spreadsheet::XLSX file load in UNIX by Tux
in thread Speeding up Spreadsheet::XLSX file load in UNIX by ketanh

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-19 16:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found