Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re^2: Need Help for Convert PDF to HTML

by LanX (Canon)
on Feb 12, 2011 at 00:40 UTC ( #887712=note: print w/ replies, xml ) Need Help??

in reply to Re: Need Help for Convert PDF to HTML
in thread Need Help for Convert PDF to HTML

> Building the HTML page (and probably some CSS as well) to mimick the PDF-layout. This will be more difficult than one thinks as the HTML document format actually is very bad in placing "things" at exactly the spot you want. The whole idea of HTML (and CSS) is that the laoyout is "flowing" and will adapt itself (more or less) graciously to the output method of the client viewing it.

Actually most of this is solvable since CSS positioning was introduced (maybe 10 years ago?), the real problem is that arbitrary fonts are (in practice) not embeddable in HTML, and reconstructing words, lines and paragraphs with even slightly different font metrics looks awkward.

For example some may remember how Google used to produce HTML-previews of PDFs, with those random gaps in the text lines.

As I already said, it highly depends on the use case. (and on differing definitions of what HTML is)

Cheers Rolf

Comment on Re^2: Need Help for Convert PDF to HTML
Re^3: Need Help for Convert PDF to HTML
by CountZero (Bishop) on Feb 12, 2011 at 11:56 UTC
    Actually most of this is solvable since CSS positioning was introduced (maybe 10 years ago?)

    Not even fixed or absolute positioning can guarantee you that the element will end up at the client's screen at exactly the place you thought you put it. Most of the time you end up with ugly scroll bars and overlapping elements or empty spots.

    And in any case I consider CSS positioning which takes the elements out of the normal flow an aberration if used to try to fix the layout, but YMMV.


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      I doubt this, I had excellent results converting DVI to HTML and this already 8 years ago on NN4 and IE5.

      Even when heuristics reproduced flowing text, with relative positioning of embedded formulas.

      But all of this only worked as long the same fonts were used.

      As I said, the positioning of elements work, the exact size of those elements is the problem.

      Cheers Rolf

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://887712]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2014-07-29 22:10 GMT
Find Nodes?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:

    Results (229 votes), past polls