Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^3: Convert PDF file into HTML file

by chrestomanci (Priest)
on Dec 22, 2010 at 13:14 UTC ( #878512=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Convert PDF file into HTML file
in thread Convert PDF file into HTML file

I did not think I was much of an expert on the internals of PDF. I had the insight to think of PDF as similar to postscript, and from that explained why perfect conversion is not possible.

Online PDF will not be any different to normal PDF, those websites are simply referring to PDF files that are already downloadable on the web, which makes their conversion tools simpler.

I had a look at a few online converters, and they mostly appear to be demos for paid apps that convert to other formats. You can't download a free executable to do the convertion on your own computer, you have to use the online tool, and see their ads.

I also suspect that if you tried writing a script to use those online tools for bulk conversion, you would quickly find something preventing you such as a CAPTCHA, or a robots exclusion policy.

In any case as I said before, the conversion will never be perfect. For an example of how far from perfect a PDF to HTML conversion can be, just click on "view as html" when google finds PDF files in a web search.


Comment on Re^3: Convert PDF file into HTML file
Re^4: Convert PDF file into HTML file
by Anonymous Monk on Feb 09, 2011 at 09:37 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://878512]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2014-12-25 20:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (163 votes), past polls