Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Parsing Arabic PDF using in perl

by graff (Chancellor)
on Mar 08, 2014 at 02:31 UTC ( #1077479=note: print w/ replies, xml ) Need Help??


in reply to Parsing Arabic PDF using in perl

The use of PDF to present Arabic text can follow at least a few different strategies, none of which bode well for the extraction of Unicode Arabic text from a PDF file. Some or all of the text may actually be stored as image data rather than as character data, and to the extent that there are portions of text comprised of discrete characters, those characters use numeric assignments that bear no discernible relation to Unicode Arabic code points.

I remember spending a few hours one time (a couple years back) trying to find web references that would explain the PDF character encoding scheme for Arabic, but I never succeeded. Of course, I'm ignorant enough about PDF details in general that I can't even assess how inadequate that attempt was.


Comment on Re: Parsing Arabic PDF using in perl

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1077479]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (8)
As of 2014-10-24 09:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (131 votes), past polls