|Perl: the Markov chain saw|
The other day I had filled out about 10 forms using Infopath. The problem is all Infopath documents are saved in XML and one cannot extract the contents of each XML Tag into simple text.
I will explain how to extract information from XML tag and save it in a DOC format. You can then apply the same concept for the Infopath form to DOC conversion.
So here goes...
XPath is one of the many XML technologies you could use to traverse the XML Tree. If you access a file in your explorer, the path to your file may be in the format "C:\folder1\file1.txt". XPath uses similar concept to walkthrough your XML file which can be thought as a Parent Tree containing many child nodes.
Lets take a very simple example. Suppose your XML file is in the following manner:
Now you want to extract the necessary Book Information, i.e, the "Title" and "Author" of the Books.
Now in XPath you get the "title" tag's content by using the Path ://Books/Book/title
Similarly for the "author" tag's content://Books/Book/author
Now that you know what and how extract the information, its time to use Perl.
The beauty of Perl is that you can materialize the idea in your mind into reality so easily. You got
I am going to use the XPath Perl module which is part of the XML module.You use the module by coding like thisuse XML::XPath
Now you need to get the BOOKS.XML file into a variable and create a new XPath object.
Open a WORD DOC file of the same name for conversion
Print necessary information in the DOC file
Use the find method using the XPath object and give it the path.//Books/Book
This will populate a Answer Node List which is used further to extract the tags <title> and <author>
The resulting Perl File(BOOKEXTRACT.PL) is given below.
After you run the PERL Script you will be presented with a DOC file of the same name as the XML file
Now that you know how to extract Tags and content into WORD DOC, you can apply the same method in the
Hope this helps.
20050113 Janitored by Corion: Fixed formatting
20050114 Unconsidered by Corion: was considered as move to Meditations (edit:14 keep:7 del:0)