perlquestion
Arik123
<p>Hi Monks!</p>
<p>I have a PDF file which contains a filled form. Unfortunately the information (text-only) isn't plain ASCII. I nned a perl script to extract the information and process it, but I can't get anything except gibberish. I figured it's condensed somehow, so I used QPDF to make the file more human-readable.</p>
<p>Now there are multiple objects whose content is something like</p>
<code>feff05e405e805d905d8002e002e002e</code>
<p>which seem to be the content of the fields, in some encoding. There are also some objects that look like:</p>
<code>
/BaseFont /RCZMJK+TimesNewRoman
/DescendantFonts 13 0 R
/Encoding /Identity-H
/Subtype /Type0
/ToUnicode 93 0 R
/Type /Font
</code>
<p>while the /ToUnicode information refes to objects that look like:</p>
<code>
93 0 obj
<<
/Length 94 0 R
>>
stream
/CIDInit /ProcSet findresource begin
12 dict begin
begincmap
/CIDSystemInfo
<< /Registry (Adobe)
/Ordering (UCS) /Supplement 0 >> def
/CMapName /Adobe-Identity-UCS def
/CMapType 2 def
1 begincodespacerange
<0000> <FFFF>
endcodespacerange
4 beginbfchar
<02A8> <05D8>
<02A9> <05D9>
<02B4> <05E4>
<02B8> <05E8>
endbfchar
endcmap CMapName currentdict /CMap defineresource pop end end
endstream
endobj
</code>
<p>I need some perl script (or a module) that can make sense of all that (to me it looks like Turkish. Hint: I don't speak Turkish) and convert it to utf-8 or some other encoding that makes sense.</p>
<p>Any help would be appreciated.</p>