in reply to Convert PDF to HTML (or JPEG)
For PDF to JPG (or any other raster image format like PNG or TIFF), you could use GhostScript to do the conversion:
$ gs -q -dBATCH -dNOPAUSE -sDEVICE=jpeg -dJPEGQ88 -r150 -sOutputFile=i +mg%d.jpg input.pdf
This would create as many images (img1.jpg to imgN.jpg) as there are pages in the PDF file. -r is the resolution in dpi (150dpi would create an image size of 1240x1754 for A4 paper size), and -dJPEGQ is the quality factor (up to 100).
Unfortunately, this doesn't do any anti-aliasing, so the fonts typically look rather ragged... You can work around that problem by doing the anti-aliasing yourself; which means, you'd have to oversample while rendering from PDF to raster (e.g. by a factor of 4, i.e. 600dpi) and then downsample with an appropriate filter.
ImageMagick's convert can be used for the latter. The complete sequence of steps would be:
$ gs -q -dBATCH -dNOPAUSE -sDEVICE=jpeg -dJPEGQ88 -r600 -sOutputFile=i +mg%d.jpg input.pdf $ for img in img*.jpg ; do convert $img -filter Lanczos -resize 25% -q +uality 90 out_$img ; done
The resulting anti-aliased images out_img*.jpg would then have 150dpi resolution.
In case you have the non-/usr/bin-namespace-polluting sister GraphicsMagick installed (instead of ImageMagick), the command would be gm convert ...
(Those who hold a degree in Signal Processing - or have come in contact with filter design in some other context - might want to take a look at the list of filters to choose from — in case of doubt, stick with Lanczos or Kaiser for somewhat sharper, or Gaussian or Cubic for somewhat softer results.)
Also, there's documentation - well hidden from daylight - under /usr/share/doc/ghostscript/Devices.htm, which explains what options are available with the individual Ghostscript output devices (you usually need to have another package installed (e.g. ghostscript-doc on Debian/Ubuntu) to have that file).
|Replies are listed 'Best First'.|
Re^2: Convert PDF to HTML (or JPEG)
by LanX (Cardinal) on Sep 12, 2009 at 14:13 UTC