comment on

The other day I had filled out about 10 forms using Infopath. The problem is all Infopath documents are saved in XML and one cannot extract the contents of each XML Tag into simple text.

I will explain how to extract information from XML tag and save it in a DOC format. You can then apply the same concept for the Infopath form to DOC conversion.

So here goes...

XPath is one of the many XML technologies you could use to traverse the XML Tree. If you access a file in your explorer, the path to your file may be in the format "C:\folder1\file1.txt". XPath uses similar concept to walkthrough your XML file which can be thought as a Parent Tree containing many child nodes.

Lets take a very simple example. Suppose your XML file is in the following manner:

BOOKS.XML File

<Books>

<Book>
 <title>Perl Magic</title>
 <author>Karthik</author>
 <publisher>ORielly</publisher>
 <price currency="Rupees" value="330"/>
</Book>

<Book>
 <title>Perl for Dummies</title>
 <author>Mark</author>
 <publisher>ORielly</publisher>
 <price currency="Rupees" value="420"/>
</Book>

</Books>
[download]

Now you want to extract the necessary Book Information, i.e, the "Title" and "Author" of the Books.
From the XML file the tags <title> and <author> should be extracted.

Now in XPath you get the "title" tag's content by using the Path :

//Books/Book/title

Similarly for the "author" tag's content:

//Books/Book/author

Now that you know what and how extract the information, its time to use Perl.

The beauty of Perl is that you can materialize the idea in your mind into reality so easily. You got
Perl modules to make your Life easy.

I am going to use the XPath Perl module which is part of the XML module.You use the module by coding like this

use XML::XPath

Now you need to get the BOOKS.XML file into a variable and create a new XPath object.

$file="books.xml";
my $xp = XML::XPath->new(filename => $file);
[download]

Open a WORD DOC file of the same name for conversion

open(INFO3, ">$file.doc");
[download]

Print necessary information in the DOC file

print INFO3 "Perl Xpath\n\n";
print INFO3 "BOOK INFORMATION:\n\n";
[download]

Use the find method using the XPath object and give it the path.

//Books/Book

This will populate a Answer Node List which is used further to extract the tags <title> and <author>
and print to the DOC file.

$xp->find('//Books/Book')->get_nodelist

The resulting Perl File(BOOKEXTRACT.PL) is given below.

BOOKEXTRACT.PL

use XML::XPath; 

 $file="books.xml"; 
my $xp = XML::XPath->new(filename => $file); 

 open(INFO3, "+>$file.doc"); 

 print INFO3 "Perl Xpath\n\n"; 
         
print INFO3 "BOOK INFORMATION:\n\n"; 

         foreach my $book ($xp->find('//Books/Book')->get_nodelist){ 
                print INFO3 "TITLE:"; 
                print INFO3 $book->find('title')->string_value."\n"; 
                print INFO3 "AUTHOR:"; 
                print INFO3 $book->find('author')->string_value."\n"; 
                print INFO3 "\n\n"; 

         } 

 print "Converted XML file into WORD file\n\n"; 
print $file." WORD document generated"; 
close(INFO3);
[download]

*************************CODE***********************

After you run the PERL Script you will be presented with a DOC file of the same name as the XML file
with the extracted information.

Now that you know how to extract Tags and content into WORD DOC, you can apply the same method in the
conversion of INFOPATH XML Files into WORD Documents.

Hope this helps.

Happy Coding.

20050113 Janitored by Corion: Fixed formatting

20050114 Unconsidered by Corion: was considered as move to Meditations (edit:14 keep:7 del:0)

In reply to Using Perl XPath for converting Infopath XML files to Word Documents by karthik4perl

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


We don't bite newbies here... much
	PerlMonks