Read doc/docx in Linux

How can i read doc/docx files using perl ... in Linux

Re: Read doc/docx in Linux
by fod (Friar) on Jul 15, 2010
    Text::Extract::Word will do that (for .doc files anyway) but you'll have to convert the windows newlines to unix ones.Maybe something like:

    perl -M'Text::Extract::Word q(get_all_text)' -e 'print get_all_text(q(document.doc))' | dos2unix | less

    if you just want a quick look at it.

Re: Read doc/docx in Linux
by TedPride (Priest) on Jul 15, 2010
    There's a command-line utility called catdoc which does that supposedly, haven't tried it myself, however. It might help to tell us WHY you need the files converted, this would let us know whether it's something you need to do on an ongoing basis or something you could just do once using software (often the simpler solution).
Re: Read doc/docx in Linux
by philipbailey (Chaplain) on Jul 15, 2010

    I have used antiword successfully in the past for reading the text of Word files at the command line. It doesn't seem to be actively maintained any more, though.

    I also notice that AbiWord has a command line option for converting Word to other formats. You could of course use the full GUI version of AbiWord, or indeed OpenOffice.

    (Update) I realise of course that none of my answer directly answers the question of reading these files in Perl, but in practice the command line possibilities mentioned are often a practical way to go.

