Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: exporting PDF::API2

by bulk88 (Priest)
on Jun 23, 2013 at 03:25 UTC ( #1040309=note: print w/replies, xml ) Need Help??


in reply to Re: exporting PDF::API2
in thread exporting PDF::API2

In some PDFs, especially ones created from lets say Illustrator to Acrobat through Adobe Distiller. Each letter of text gets flattened into a color filled polygon/beizer curves. There is no text in the COS tree of the PDF, just postscript polygon draw operators. I think OCR is the only way to get back computer meaning of the text. A WAG says since it all came from one font in a vector graphics program, you could try to programatically checksum each polygon against a known checksum of the polygon of each letter which was human IDed. I would look for a library that does this already, implementing on your own is insane.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1040309]
help
Chatterbox?
[shmem]: well, there are some builtins which don't have a prototype...
[Discipulus]: eh eh.. i was looking in toke.c but dunno if is already used
[shmem]: oh the whitespace in the regex got condensed, meh
[shmem]: should be /^ {7}(\w+)/ or such
[shmem]: ...at least for my perldoc on Linux debian 8
[Discipulus]: mmh.. at themomemt i just an old 5.8 and outputs just GLOBAL::

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2017-05-23 10:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?