Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Convert PDF to HTML (or JPEG)

by ww (Bishop)
on Sep 12, 2009 at 09:15 UTC ( #794907=note: print w/ replies, xml ) Need Help??


in reply to Convert PDF to HTML (or JPEG)

I don't know if this will help, but have you evaluated SWISH::Filters::Pdf2HTML?

from CPAN:

- Perl extension for filtering PDF documents with Swish-e
This is a plug-in module that uses the xpdf package to convert PDF documents to html for indexing by Swish-e. Any info tags found in the PDF document are created as meta tags.
This filter plug-in requires the xpdf package


Comment on Re: Convert PDF to HTML (or JPEG)
Re^2: Convert PDF to HTML (or JPEG)
by Sewi (Friar) on Sep 12, 2009 at 09:25 UTC
    I tried xpdf some time ago when looking for the same problem and it seems that xpdf ignores pictures at all when converting :-(

      I'm not quite sure what you were expecting, README:

      Xpdf is an open source viewer for Portable Document Format (PDF) files. (These are also sometimes also called 'Acrobat' files, from the name of Adobe's PDF software.) The Xpdf project also includes a PDF text extractor, PDF-to-PostScript converter, and various other utilities.

      man pdftotext:

      Pdftotext converts Portable Document Format (PDF) files to plain text. Pdftotext reads the PDF file, PDF-file, and writes a text file, text- file. If text-file is not specified, pdftotext converts file.pdf to file.txt. If text-file is ´-’, the text is sent to stdout.

      man pdfimages:

      Pdfimages saves images from a Portable Document Format (PDF) file as Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files. Pdfimages reads the PDF file PDF-file, scans one or more pages, and writes one PPM, PBM, or JPEG file for each image, image-root-nnn.xxx, where nnn is the image number and xxx is the image type (.ppm, .pbm, .jpg).

      These utilities are not designed to output html with embeded images.

      Martin

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://794907]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2014-09-21 02:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (166 votes), past polls