http://www.perlmonks.org?node_id=1028993


in reply to Comparison word against pdf

Because of how text is generated in PDF file this will be a next to impossible task. What may look like a complete word in the PDF file may actually be a combination of many letters or groups of letters. Also text does not flow in the same manner as in word.

You can improve your chances of success if you know exactly how the PDF files were created and by what application. If you have access to Adobe Illustrator, you can import the PDF files and see how each page is constructed and this may give you insight in to how to read the PDF objects to extract the text.