Count number or words in PDF

anniyan
Dear Monks,

I need to count the number of words in the PDF file.

What i tried is, i saved the PDF file as word file and using Win32::OLE i added a vba macro to count the number of words. The problem is, it is taking very large time to save pdf as word document.

Earlier i did some work using CAM::PDF, in that there is no method to identify number of words.

Is it possible to count the number of words with word application? If so is there any other module to accomplish this? Even if there is any module to 'save as' pdf to word in perl please guide.


Re: Count number or words in PDF
marto

    CAM::PDF::PageText looks as tho it will extract the text from a PDF, once you have this you can then count the number of words. Failing that, pdftotext is available as part of the Xpdf suite here, you could convert the pdf file to a text file and count the number of words within it.

    Hope this helps.

Re: Count number or words in PDF
zentara
    There is ps2ascii, part of Ghostscript tools. Windows versions are available.

Re: Count number or words in PDF
Truman
    Why are you trying to convert it into a Word file anyway? I guess the reason it must be taking time to convert the PDF to a Word document is because of the text formatting involved.

    A better way could be converting the PDF into a plaintext file and then going ahead.

Node Type: perlquestion
    Results (450 votes). Check out past polls.