Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: Convert PDF file into HTML file

by chrestomanci (Priest)
on Dec 22, 2010 at 13:14 UTC ( #878512=note: print w/replies, xml ) Need Help??


in reply to Re^2: Convert PDF file into HTML file
in thread Convert PDF file into HTML file

I did not think I was much of an expert on the internals of PDF. I had the insight to think of PDF as similar to postscript, and from that explained why perfect conversion is not possible.

Online PDF will not be any different to normal PDF, those websites are simply referring to PDF files that are already downloadable on the web, which makes their conversion tools simpler.

I had a look at a few online converters, and they mostly appear to be demos for paid apps that convert to other formats. You can't download a free executable to do the convertion on your own computer, you have to use the online tool, and see their ads.

I also suspect that if you tried writing a script to use those online tools for bulk conversion, you would quickly find something preventing you such as a CAPTCHA, or a robots exclusion policy.

In any case as I said before, the conversion will never be perfect. For an example of how far from perfect a PDF to HTML conversion can be, just click on "view as html" when google finds PDF files in a web search.

Replies are listed 'Best First'.
Re^4: Convert PDF file into HTML file
by Anonymous Monk on Feb 09, 2011 at 09:37 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://878512]
help
Chatterbox?
[Corion]: LanX: Ah, yeah - Frankfurt is in the quake region, but at the very border. I think I've never noticed a quake in Frankfurt myself
[marto]: good morning all
[Corion]: On another topic, in the process of Rubber Duck SoPWing, I wrote a post about the best API for generating HTTP requests (not sending them) and while writing it and fleshing out my sample implementation, I came up with some improvements to my ...
[Corion]: ... existing prototype. Cartesian Products will be proud of that module once it gets out ;)
[Corion]: Hi marto ;)
LanX has the same problem with English accents ...
[LanX]: Will Cartesian Products still be tarif free after Brexit?

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2017-01-18 10:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you watch meteor showers?




    Results (161 votes). Check out past polls.