http://www.perlmonks.org?node_id=1029150


in reply to Re: How can I read the .docx file in perl?
in thread How can I read the .docx file in perl?

In addition, Microsoft provides XML schemas, e.g. here, by which the contents of the file can be validated – also used in some forms of extraction.

If you use an “industrial strength” package such as XML::LibXML, which is based on the industry-standard libxml2 library, you will get all the goodies that you need.

IIRC, Microsoft was told a few years ago by several governments that a “closed” format was no longer acceptable for government documents ... a very sensible concern, of course.   Of course, ODF is also an XML-based format.   See http://oasis-open.org.