http://www.perlmonks.org?node_id=999337

ImJustAFriend has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks... been a visitor for years, and have benefited from the knowledge bestowed upon these boards. I now need to seek some Perl wisdom that I can't find through the search feature of this monastery.

I have a few XML files fed from one Excel spreadsheet, one XML file "template" per device type. We have been manually editing template XML files, but now I want to automate this process. I have code that works fine for one device, but I need to expand the logic to include the other device types (which have different XML templates). I would like to keep this as "one script, one Excel file". To do so, I need to figure out how to search the Excel file, across multiple worksheets, for a specific "name" in the XML then gather the value from the worksheet.

The XML looks like this:

<module name="Security"> <function name="common"> <option name="SECURITY" value="SECURED"> <configuration>A</configuration> <configuration>B</configuration> <configuration>C</configuration> <configuration>D</configuration> <configuration>E</configuration> <configuration>F</configuration> <default_value>SECURED</default_value> <example>MIXED</example> <comment>Comment 1</comment> <comment>Comment 2</comment> <comment>Comment 3</comment> <site_priority>1</site_priority> <mandatory>NO</mandatory> <level>0</level> </option> </function> <function name="ssh"> <option name="..." value="..."> ...

The worksheets look like this (fake data):

+-----------+---+---+---+---+---+---+---+---+ | Parameter | A | B | C | D | E | F | G | H | |-----------+---+---+---+---+---+---+---+---+ | SECURITY | S | S | S | S | S | S | S | S | |-----------+---+---+---+---+---+---+---+---+ | PARAM 2 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | |-----------+---+---+---+---+---+---+---+---+ ...

There are 12 worksheets, all formatted as above, for the XML creation. Each worksheet has the same number of columns (one per device type), but could have anywhere from 1 to 100 rows of data. A-H in the first row represent different devices for which data would be pulled.

So my script logic currently says "start reading the XML from the top of the file. If the line matches 'option name', pull the data from Excel in worksheet x, row y, column z... else, print the line to the output XML file and read next line." Worksheet x, row y, and column z are all either created from script parameters (column z), or calculated from loop counts.

That code works spectacularly for devices A-D in my Excel mock-up, but E-H have a different XML template to follow. So, I am thinking to change my logic to say something like "start reading the XML from the top of the file. If the line matches 'option name', search through the entirety of the Excel file until the matching parameter name is found and pull the appropriate data... else, print the line to the output XML file and read next line." I have been searching for 3 days for how to search an entire Excel workbook for specific text, but thus far I am not having any luck.

So, Perl Monks... can someone please point me in the right direction for searching an Excel workbook for a specific string?

Thank you very much!!

ImJustAFriend

Replies are listed 'Best First'.
Re: Search Entire Excel Workbook For Text
by sundialsvc4 (Abbot) on Oct 16, 2012 at 14:01 UTC

    This seems to me to be an ideal application of XPath expressions, especially since you say that “E-H have a different XML template to follow.”   (If there are two variations to deal with now, one day there will be three and then four.)   With this approach, the expressions define what you are looking for, and it becomes XPath’s job to find it.   This will probably very-considerably simplify your logic overall, because the structure of your program is no longer matched to that of the file, and does not change as the files inevitably do.   All you need is an expression, perhaps in a list of them, that will succeed for a particular file.   (XML::LibXML is my recommendation but not the only choice.)   Notice also that the structure of the expression does not have to consider the entire structure of the surrounding file, either:   “tell me what you’re looking for, and I can find it, wherever it is.”

    Another thread, today, mentioned XML::XSH, which seems to include a shell for applying XPath expressions interactively, among other features.   I haven’t (yet!) looked into that, but I want to include a link to it here anyhow.

      So, I am not familiar with XPath. I have been looking at this since I read your post, and it seems XPath is used to read XML, not Excel. Am I seeing this correctly?

      Thank you!!

      ImJustAFriend

        You're mostly correct. Most of your post was talking about XML, which I think could confuse folks about what you're needing help with: XML or Excel.

        What OS are you on? If you're on Windows, do you have Excel installed and available for use? Which Excel version is the file that you're using?

        • As for Excel file type, there's the older .xls version and the newer .xlsx type. Very different structures.
        • From non-Windows OSes and Windows without Excel, you'll probably be interested in Spreadsheet::ParseExcel for the older .xls file type and Spreadsheet::XLSX for the newer .xls.
        • From a Windows system with Excel, I personally would use Win32::OLE to control Excel to access the Excel file.
        • Also, with the .xlsx file type, there's another route to go. I believe that's just a compressed XML file. That means that you can uncompress it and then do XML parsing. If you go that route, then sundialsvc4's response about using XPath type modules for XML parsing now becomes very applicable.
Re: (( Resolved )) Search Entire Excel Workbook For Text
by ImJustAFriend (Scribe) on Oct 19, 2012 at 13:12 UTC

    Hi Monks. I have resolved this issue, though not in the way I initially set out to do so. I have changed my script so now I am slurping the data into a hash (worksheetname-parameter as key, parametervalue as value) and using the hash as my search target.

    Thanks for the insights!!

    ImJustAFriend