http://www.perlmonks.org?node_id=521170

anniyan has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I need to count the number of words in the PDF file.

What i tried is, i saved the PDF file as word file and using Win32::OLE i added a vba macro to count the number of words. The problem is, it is taking very large time to save pdf as word document.

Earlier i did some work using CAM::PDF, in that there is no method to identify number of words.

Is it possible to count the number of words with word application? If so is there any other module to accomplish this? Even if there is any module to 'save as' pdf to word in perl please guide.

Regards,
Anniyan
(CREATED in HELL by DEVIL to s|EVILS|GOODS|g in WORLD)

Replies are listed 'Best First'.
Re: Count number or words in PDF
by marto (Cardinal) on Jan 05, 2006 at 11:38 UTC
    anniyan,

    CAM::PDF::PageText looks as tho it will extract the text from a PDF, once you have this you can then count the number of words. Failing that, pdftotext is available as part of the Xpdf suite here, you could convert the pdf file to a text file and count the number of words within it.

    Hope this helps.

    Martin
Re: Count number or words in PDF
by zentara (Archbishop) on Jan 05, 2006 at 12:37 UTC
    There is ps2ascii, part of Ghostscript tools. Windows versions are available.

    I'm not really a human, but I play one on earth. flash japh
Re: Count number or words in PDF
by Truman (Novice) on Jan 05, 2006 at 17:51 UTC
    Why are you trying to convert it into a Word file anyway? I guess the reason it must be taking time to convert the PDF to a Word document is because of the text formatting involved.

    A better way could be converting the PDF into a plaintext file and then going ahead.