Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Generating pdf file for and from an html file

by RenardBleu (Sexton)
on Apr 12, 2014 at 12:53 UTC ( [id://1082080]=perlquestion: print w/replies, xml ) Need Help??

RenardBleu has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Id like to propose a pdf button on a website. The button would hallow to save in a pdf file the curent webpage (that is to say the page where the button is)
Looks like i sohuld use http://search.cpan.org/~autrijus/PDF-FromHTML-0.08/lib/PDF/FromHTML.pm but i dont really get how to do it for my specific case.
Since i think this allready exists, i ask you for some help on this in the same time i dig more the given link ^^
  • Comment on Generating pdf file for and from an html file

Replies are listed 'Best First'.
Re: Generating pdf file for and from an html file
by Corion (Patriarch) on Apr 12, 2014 at 12:59 UTC

    What part are you stuck with?

    From the module documentation, it seems it wants to get the HTML and then can render that into PDF.

    As you already seem to have a way to produce the HTML, for the conversion to PDF, you would just do that:

    1. Produce the HTML
    2. Convert it to PDF
    3. Output the PDF to the user

    It would seem to me that step 1 is unlikely to be the problematic step as you say you already have the HTML.

    So which one of step 2 and step 3 is problematic, what code have you written and how does it fail to produce the appropriate output?

Re: Generating pdf file for and from an html file
by soonix (Canon) on Apr 12, 2014 at 21:09 UTC

    Seconding Corion's answer.

    Possibly, your problem is, that PDF::FromHTML doesn't speak CSS. In that case, you could look into HTML::HTMLDoc, which also seems to be capable of generating PDF from HTML. (doesn't speak CSS, either)

Re: Generating pdf file for and from an html file
by CountZero (Bishop) on Apr 13, 2014 at 12:33 UTC
    This has been tried many times, but I have never seen a generally acceptable solution for converting a webpage to a PDF-file. It might be possible for very specific, narrowly defined and formatted webpages based upon a template where the variable data can be plugged both into an HTML and a PDF template.

    The main problem is that the HTML format is geared towards very fluid and flexible rendering, all depending on the output device whereas PDF is a format for a page, where pagination, position of pictures, font size and type, ... are all most strictly defined.

    If you do not care about an exact 1-to-1 rendering between the webpage and the PDF file and the HTML does not try to "fix" the lay-out but rather uses the tags as semantic and structural guides, then it might just be possible to write a parser that can translate HTML to another format (I was thinking of TeX or LaTeX) that can be rendered into a PDF-file.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
Re: Generating pdf file for and from an html file
by jmacloue (Beadle) on Apr 12, 2014 at 20:22 UTC

    Speaking of arbitrary HTML to PDF conversion I'd suggest using a ready-made solution like Docfrac, HTMLDOC or even wkhtmltopdf.

      A "ready-made solution" is certainly a good idea: no point in reinventing wheels.

      If the "HTMLDOC" you're referring to is HTML::HTMLDoc, I'd advise some caution. This module has many unresolved bugs and maintainance appears to have been abandoned. I did some investigation into this a few days ago: "Re: HTML::HTMLDoc -- Including a base64 img".

      I'm not familiar with the other two solutions you mentioned.

      -- Ken

        No, all three solutions I mentioned are not Perl modules but C/C++ tools. See docfrac.net for Docfrac and try googling the other two tools - unfortunately my access level doesn't allow me to post links here yet.

        Used all three of these in my projects (well, PHP projects to be clear but not much difference for this matter), they have some quirks but generally work. wkhtmltopdf is the most advanced one (basicly it's a full-featured WebKit-based browser with PDF exporting option) but it tends to render images instead of text which may be not what you want. Docfrac and HTMLDOC, on the other hand, are HTML parsers/renderers, their output PDF is better structured but support for formatting is very limited.

        And, generally speaking, the task of converting HTML to a picture (PNG, JPEG, PDF, etc) falls rather out of Perl's scope - it's not impossible to implement a decent layout engine in Perl but, well, performance would be a disaster and amount of work incredible.

      Damn, I tried a lot of solutions before trying yours since i finally wanted to try a client side solution (i.e. in javascript)

      Since the css i am using is kinda simple, i tried half a dozen of solutions.

      Clearly for me, wkhtmltopdf is the greatest solution. I wish i had found a way to do this on the client side but at least i get exactly the same output as my html.

      Thx for pointing me this out.
      Also thx to anyone contributing to the thread :)

      cheers
Re: Generating pdf file for and from an html file
by soonix (Canon) on Apr 12, 2014 at 21:28 UTC

    Seconding Corion's answer.

    Possibly, your problem is, that PDF::FromHTML doesn't speak CSS. In that case, you could look into HTML::HTMLDoc, which also seems to be capable of generating PDF from HTML.

    please reap, duplicate

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1082080]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-23 17:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found