Re: exporting PDF::API2
by CountZero (Bishop) on Jun 20, 2013 at 18:07 UTC
|
You forgot to mention Perl in your list of technical skills!
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] |
|
| [reply] |
|
Here in unemployed Michigan, there's a state agency that performs resume consulting. The resume is about a year old, and I only remembered a couple days ago that the consultant thought perl was a liability on my resume. With state agencies giving such nonsensical advice, no wonder there is much unemployment in their state.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James My blog: Imperial Deltronics
| [reply] |
|
"Lots of fun for analogies..."
My best regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] |
Re: exporting PDF::API2
by tqisjim (Beadle) on Jun 20, 2013 at 17:38 UTC
|
FWIW, I tried using CAM::PDF:
$pdf = new CAM::PDF 'Jim Schueler.pdf' ;
print $pdf->getPageText(1) ;
The results are about the same.
-Jim | [reply] [d/l] |
|
| [reply] |
|
| [reply] [d/l] |
Re: exporting PDF::API2
by pvaldes (Chaplain) on Jun 22, 2013 at 10:28 UTC
|
K0V6wNoH.pdf: 407386 bytes, can-print yes, can-modif yes, can-copy yes, can-add yes
(You can find the script utilized in my blog)
So the problem here is not that your pdf can't be copied (or opened to read). To lose the select/copy text ability is a common trouble when a pdf is repaired/fixed with some programs.
In short: a pdf can be optimized if you convert it again to ps, and then you use ghostscript (or ps2pdf) to recreate it. You obtain a smaller archive but you can lose more modern features and things like text selection, (typically in the first pass). The cause avoiding you to select a text can be, i.e., in the encoded fonts utilized.
You can fix this with Adobe software (or probably messing with the inner structure). Take a look to the documentation section of the CAM-PDF module from Chris Dolan with a lot of useful (and better) perl scripts to analize your pdf | [reply] [d/l] |
|
In some PDFs, especially ones created from lets say Illustrator to Acrobat through Adobe Distiller. Each letter of text gets flattened into a color filled polygon/beizer curves. There is no text in the COS tree of the PDF, just postscript polygon draw operators. I think OCR is the only way to get back computer meaning of the text. A WAG says since it all came from one font in a vector graphics program, you could try to programatically checksum each polygon against a known checksum of the polygon of each letter which was human IDed. I would look for a library that does this already, implementing on your own is insane.
| [reply] |
Re: exporting PDF::API2
by tqisjim (Beadle) on Jun 21, 2013 at 20:58 UTC
|
| [reply] |
Re: exporting PDF::API2
by Beechbone (Friar) on Jun 25, 2013 at 10:23 UTC
|
| [reply] [d/l] |
|
Cutting and pasting now works in my PDF file. But C&P did not work in your proposed solution: I had to s/udf/utf/.
I did a little additional testing: Of the flags in your response, only the -unicodemap seems required. Although the utf8 encoding flag's functionality is clear, the other two flags do not effect embedded text.
While digging in the source code, I discovered that PDF::API2 sets -unicodemap
by default- although PDF::API3::Compat::API2 does not. Sure enough, when I went back and tested with PDF::API2, the text was embedded. Maybe this *is* a bug in PDF::API3::Compat::API2. Even so, I only encountered it trying to fix a bug I didn't have in the first place. :(
| [reply] |