Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Using Perl XPath for converting Infopath XML files to Word Documents

by karthik4perl (Initiate)
on Jan 13, 2005 at 10:44 UTC ( #421936=perltutorial: print w/replies, xml ) Need Help??

 The other day I had filled out about 10 forms using Infopath. The problem is all Infopath documents are saved in XML and one cannot extract the contents of each XML Tag into simple text.

 I will explain how to extract information from XML tag and save it in a DOC format. You can then apply the same concept for the Infopath form to DOC conversion.

 So here goes...

 XPath is one of the many XML technologies you could use to traverse the XML Tree. If you access a file in your explorer, the path to your file may be in the format "C:\folder1\file1.txt". XPath uses similar concept to walkthrough your XML file which can be thought as a Parent Tree containing many child nodes.

Lets take a very simple example. Suppose your XML file is in the following manner:


<Books> <Book> <title>Perl Magic</title> <author>Karthik</author> <publisher>ORielly</publisher> <price currency="Rupees" value="330"/> </Book> <Book> <title>Perl for Dummies</title> <author>Mark</author> <publisher>ORielly</publisher> <price currency="Rupees" value="420"/> </Book> </Books>

Now you want to extract the necessary Book Information, i.e, the "Title" and "Author" of the Books.
From the XML file the tags <title> and <author> should be extracted.

Now in XPath you get the "title" tag's content by using the Path :


Similarly for the "author" tag's content:


Now that you know what and how extract the information, its time to use Perl.

The beauty of Perl is that you can materialize the idea in your mind into reality so easily. You got
Perl modules to make your Life easy.

I am going to use the XPath Perl module which is part of the XML module.You use the module by coding like this

use XML::XPath

Now you need to get the BOOKS.XML file into a variable and create a new XPath object.

$file="books.xml"; my $xp = XML::XPath->new(filename => $file);

Open a WORD DOC file of the same name for conversion

open(INFO3, ">$file.doc");

Print necessary information in the DOC file

print INFO3 "Perl Xpath\n\n"; print INFO3 "BOOK INFORMATION:\n\n";

 Use the find method using the XPath object and give it the path.


 This will populate a Answer Node List which is used further to extract the tags <title> and <author>
and print to the DOC file.


The resulting Perl File(BOOKEXTRACT.PL) is given below.


use XML::XPath; $file="books.xml"; my $xp = XML::XPath->new(filename => $file); open(INFO3, "+>$file.doc"); print INFO3 "Perl Xpath\n\n"; print INFO3 "BOOK INFORMATION:\n\n"; foreach my $book ($xp->find('//Books/Book')->get_nodelist){ print INFO3 "TITLE:"; print INFO3 $book->find('title')->string_value."\n"; print INFO3 "AUTHOR:"; print INFO3 $book->find('author')->string_value."\n"; print INFO3 "\n\n"; } print "Converted XML file into WORD file\n\n"; print $file." WORD document generated"; close(INFO3);

After you run the PERL Script you will be presented with a DOC file of the same name as the XML file
with the extracted information.

Now that you know how to extract Tags and content into WORD DOC, you can apply the same method in the
conversion of INFOPATH XML Files into WORD Documents.

Hope this helps.

Happy Coding.

20050113 Janitored by Corion: Fixed formatting

20050114 Unconsidered by Corion: was considered as move to Meditations (edit:14 keep:7 del:0)

Replies are listed 'Best First'.
Re: Using Perl XPath for converting Infopath XML files to Word Documents
by mirod (Canon) on Jan 13, 2005 at 13:08 UTC

    A few comments:

    You seem to think that in an XPath expression '//' denotes the top of the tree. It doesn't. The path you should be using is /Books/Book. '//' is more like a wildcard: //book will find all the book nodes in the document. Using '//' in your case forces the XPath engine to test basically all nodes in the document, while /Books/Book is much more efficient, and tests only the root and first-level children. For a good XPath tutorial have a look at

    A couple of minor stylistic quibbles: I don't think you need to write foreach my $book ($xp->find('/Books/Book')->get_nodelist), as find in list context will return an array, so you can just write foreach my $book ($xp->find('/Books/Book')); you could also replace $book->find('author')->string_value by simply $book->findvalue('author'), which, besides being shorter, brings also the added benefit that it won't die if for some reason the author element is not present.

    Finally, you wrote: the XPath Perl module which is part of the XML module. Not quite, XML::XPath is a module in the XML namespace, just like XML::Parser, XML::Simple or any other XML:: module.

      Will this scritp will also run in UNIX operating system? As I donot have any XML pareser which can be loaded for UNIX.

        Did you try?

        It will work if you have XML::XPath (and XML::Parser) installed. You also need expat, the XML parsing library, which comes installed on a lot of systems, or can be compiled from sources (just make sure you are using the same compiler you compiled Perl with)

Re: Using Perl XPath for converting Infopath XML files to Word Documents
by gellyfish (Monsignor) on Jan 13, 2005 at 11:52 UTC

    Forgive me if I'm missing something here, but this isn't converting anything into a Word document - it is creating a plain text file that happens to have a '.doc' extension. If you really want Word documents it would not be to difficult to use Win32::OLE to automate the insertion of the data into a new Word document.


Re: Using Perl XPath for converting Infopath XML files to Word Documents
by Anonymous Monk on Aug 01, 2008 at 10:50 UTC
    So how would you "find" price value then? can someone give me an example as i cannot seem to get this working. Thanks

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perltutorial [id://421936]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2023-09-30 15:57 GMT
Find Nodes?
    Voting Booth?

    No recent polls found