Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: exporting PDF::API2

by pvaldes (Chaplain)
on Jun 22, 2013 at 10:28 UTC ( #1040266=note: print w/ replies, xml ) Need Help??


in reply to exporting PDF::API2

K0V6wNoH.pdf: 407386 bytes, can-print yes, can-modif yes, can-copy yes, can-add yes

(You can find the script utilized in my blog)

So the problem here is not that your pdf can't be copied (or opened to read). To lose the select/copy text ability is a common trouble when a pdf is repaired/fixed with some programs.

In short: a pdf can be optimized if you convert it again to ps, and then you use ghostscript (or ps2pdf) to recreate it. You obtain a smaller archive but you can lose more modern features and things like text selection, (typically in the first pass). The cause avoiding you to select a text can be, i.e., in the encoded fonts utilized.

You can fix this with Adobe software (or probably messing with the inner structure). Take a look to the documentation section of the CAM-PDF module from Chris Dolan with a lot of useful (and better) perl scripts to analize your pdf


Comment on Re: exporting PDF::API2
Download Code
Re^2: exporting PDF::API2
by bulk88 (Priest) on Jun 23, 2013 at 03:25 UTC
    In some PDFs, especially ones created from lets say Illustrator to Acrobat through Adobe Distiller. Each letter of text gets flattened into a color filled polygon/beizer curves. There is no text in the COS tree of the PDF, just postscript polygon draw operators. I think OCR is the only way to get back computer meaning of the text. A WAG says since it all came from one font in a vector graphics program, you could try to programatically checksum each polygon against a known checksum of the polygon of each letter which was human IDed. I would look for a library that does this already, implementing on your own is insane.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1040266]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2014-12-21 16:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (106 votes), past polls