in reply to Can't Find my way in Excel
Here is a fresh point of view:
0. Before a query, find all your xls files
1. Keep a stored hash of MD5 checksum's and filenames. This will tell you if a xls has changed and it's data needs to be re-extracted.
2. For each new Excel file not yet with MD5 (or with different checksum), convert the excel to xlsx and extract the ./xl/sharedStrings.xml (which contains all the texts from all tabs)
3. Grep inside all sharedStrings.xml for faster response times (adjust your grep parameters, like grep -l)
For example, Libreoffice has:
If you do not have access to install a current version of LibreOffice then there are macros in VBS (XlsToCsv.vbs on S.O.) that output to multiple csv files.libreoffice --headless --convert-to
On Linux there might also be unoconv in your repository.
oh, and if you give up rolling your own implementation, why not use a ready-made solution: DocFetcher
|
---|
In Section
Seekers of Perl Wisdom