|Just another Perl shrine|
Re: Convert PDF to HTML (or JPEG)by almut (Canon)
|on Sep 12, 2009 at 12:31 UTC||Need Help??|
For PDF to JPG (or any other raster image format like PNG or TIFF), you could use GhostScript to do the conversion:
This would create as many images (img1.jpg to imgN.jpg) as there are pages in the PDF file. -r is the resolution in dpi (150dpi would create an image size of 1240x1754 for A4 paper size), and -dJPEGQ is the quality factor (up to 100).
Unfortunately, this doesn't do any anti-aliasing, so the fonts typically look rather ragged... You can work around that problem by doing the anti-aliasing yourself; which means, you'd have to oversample while rendering from PDF to raster (e.g. by a factor of 4, i.e. 600dpi) and then downsample with an appropriate filter.
ImageMagick's convert can be used for the latter. The complete sequence of steps would be:
The resulting anti-aliased images out_img*.jpg would then have 150dpi resolution.
In case you have the non-/usr/bin-namespace-polluting sister GraphicsMagick installed (instead of ImageMagick), the command would be gm convert ...
(Those who hold a degree in Signal Processing - or have come in contact with filter design in some other context - might want to take a look at the list of filters to choose from — in case of doubt, stick with Lanczos or Kaiser for somewhat sharper, or Gaussian or Cubic for somewhat softer results.)
Also, there's documentation - well hidden from daylight - under /usr/share/doc/ghostscript/Devices.htm, which explains what options are available with the individual Ghostscript output devices (you usually need to have another package installed (e.g. ghostscript-doc on Debian/Ubuntu) to have that file).