in reply to Re^3: Mechanize Firefox text Method
in thread Mechanize Firefox text Method
Thanks again for your reply. Let me clarify a bit. Since I can read the documents in the browser I know they contain only text so OCR is not an issue. All the documents follow a similar set of templates but the content changes for each. I have viewed hundreds of these and any document that does not conform will be skipped.
Your comments on downloading and then using a pdftotext tool on the local file is inline with my current thinking as long as it can be scripted and run without intervention. Are there any other suggestions I should examine?