![]() |
|
Perl: the Markov chain saw | |
PerlMonks |
Re: Convert PDF to HTML (or JPEG)by almut (Canon) |
on Sep 12, 2009 at 12:31 UTC ( #794918=note: print w/replies, xml ) | Need Help?? |
For PDF to JPG (or any other raster image format like PNG or TIFF), you could use GhostScript to do the conversion:
This would create as many images (img1.jpg to imgN.jpg) as there are pages in the PDF file. -r is the resolution in dpi (150dpi would create an image size of 1240x1754 for A4 paper size), and -dJPEGQ is the quality factor (up to 100). Unfortunately, this doesn't do any anti-aliasing, so the fonts typically look rather ragged... You can work around that problem by doing the anti-aliasing yourself; which means, you'd have to oversample while rendering from PDF to raster (e.g. by a factor of 4, i.e. 600dpi) and then downsample with an appropriate filter. ImageMagick's convert can be used for the latter. The complete sequence of steps would be:
The resulting anti-aliased images out_img*.jpg would then have 150dpi resolution. In case you have the non-/usr/bin-namespace-polluting sister GraphicsMagick installed (instead of ImageMagick), the command would be gm convert ... (Those who hold a degree in Signal Processing - or have come in contact with filter design in some other context - might want to take a look at the list of filters to choose from — in case of doubt, stick with Lanczos or Kaiser for somewhat sharper, or Gaussian or Cubic for somewhat softer results.) Also, there's documentation - well hidden from daylight - under /usr/share/doc/ghostscript/Devices.htm, which explains what options are available with the individual Ghostscript output devices (you usually need to have another package installed (e.g. ghostscript-doc on Debian/Ubuntu) to have that file).
In Section
Seekers of Perl Wisdom
|
|