http://www.perlmonks.org?node_id=887640


in reply to Need Help for Convert PDF to HTML

Another difficulty I do not see listed among the replies here is the issue of embedded fonts. PDF documents allow for embedding of fonts, and HTML does not. If usage of non-standard (non-web) fonts is embedded in the source PDF, then extraction of the font becomes a significant challenge. Some tools are available to do just that. CAM::PDF can Extract Font Info from PDF, but when brian_d_foy asked about extracting the fonts themselves Chris Dolan intends to never add that feature.

If you happen to have the font, that may be easier. It really depends on your source PDF document.

CSS can be used to specify such fonts (see FontSpring "Bulletproof" Method, Smiley Variation among many).

There are also licensing issues in play for many fonts. Depending on your circumstances (and perhaps the font requirements) this may be of concern/interest to you.

Replies are listed 'Best First'.
Re^2: Need Help for Convert PDF to HTML
by inman2787 (Initiate) on Mar 26, 2011 at 04:29 UTC
    1. Convert PDF file to text file using Acrobat Reader or any program similiar. Just save it as a text file, no need for pro or extended versions of reader.
    2. Open TextEdit.app, open up the text file you've created, copy/paste whole thing to a new document window.
    - Open Preferences in TextEdit
    - Go to the "Open/Save" Tab
    - Change Document Type to HTML Strict or XHTML strict depending on your needs. In Styling, select No CSS.
    - Go back and save the new document now as a html file.
    There is a step by step instruction on how to convert PDF to HTML.
    Hope that helps !
Re^2: Need Help for Convert PDF to HTML
by Anonymous Monk on Dec 31, 2011 at 15:25 UTC
    HTML 5 has embedded fonts via JavaScript