Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Convert PDF file into HTML file

by ww (Bishop)
on Dec 22, 2010 at 13:28 UTC ( #878520=note: print w/ replies, xml ) Need Help??


in reply to Convert PDF file into HTML file

There's another possible complication beyond those enumerated in the excellent Re: Convert PDF file into HTML file.

Some .pdf are created by scanning text_on_paper 1. The intermediate is an image, not unlike a .png, .jpg or .bmp. The resultant .pdf contains a picture of the text, not the ASCII or UTF or Kanji characters, per se.

And that, TTBOMK, leaves only the OCR option for retrieving the text as text.

Update: Addition below, for clarity:

1.   This is typical, for example, of low-cost home "MFC" and "all-in-one" printer-scanner-copiers and of offices with limited, low-level IT knowledge and support and is effected by use of the multi-function copiers now commonly replacing single-purpose Xerograpic copiers.


Comment on Re: Convert PDF file into HTML file

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://878520]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (12)
As of 2015-07-06 15:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (77 votes), past polls