Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
Perl: the Markov chain saw
 
PerlMonks  

Filling out PDF forms with data from DBI?

by merlyn (Sage)
on Jun 21, 2006 at 20:18 UTC ( #556764=perlquestion: print w/ replies, xml ) Need Help??
merlyn has asked for the wisdom of the Perl Monks concerning the following question:

I have a client that has a number of "PDF fill-in forms", such as the W-9 at http://www.irs.gov/pub/irs-pdf/fw9.pdf. They want me to write Perl code to process the form, figure out the fields, and then reprocess that form pre-filling the fields according to values from a database.

I can't be the first person who has needed to do this, but I can't figure out if either PDF::API2 or PDF::Reuse can do this (or something else in the CPAN, for that matter).

Any assistance would be appreciated. Thanks.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Comment on Filling out PDF forms with data from DBI?
Re: Filling out PDF forms with data from DBI?
by diotalevi (Canon) on Jun 21, 2006 at 20:30 UTC

    If nothing perl will do it, there's always COM from Win32::OLE.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: Filling out PDF forms with data from DBI?
by holli (Monsignor) on Jun 21, 2006 at 20:33 UTC
    I see a new column coming up... ;-)

    Good luck, merlyn.


    holli, /regexed monk/
Re: Filling out PDF forms with data from DBI?
by jdtoronto (Prior) on Jun 21, 2006 at 20:48 UTC
    Merlyn,

    I really like PDF::API2, having discovered it in "Perl Graphics Programming" - on *nix it has been really great! I have two big webapps using it for generating sell sheets for the travel industry. But I have had problems with it on Win32 the last few days.

    I have also used PDF::Reuse(and given the troubles with PDF::API2 on Win32 am using it right now!), it is somewhat more obtuse, but there is a pretty comprehensive tutorial and whilst I haven't used it on *nix it has been very stable on Win32.

    jdtoronto

      I am a newcomer to Perl, but have much experience with postscript/PDF. I would imagine that the PDF::xx are for reading and generating the PDF file itself, not the trivial forms fill ins. Adobe has been working on the online forms problem for as long as there has been a web. (they went as far as to purchase a company about 7 years ago that did nothing but that) Adobe has an array of commercial solutions (trying to stabilize in the face of M$ft and open standards). although I am an implementor at heart, I little bit of money might go a long way to solve the problem well.

      PDF::API2 and PDF::Reuse are both great for generating pure pdf documents. Importing existing Javascript-enhanced forms and filling them out, that's another story.

      PDF::Reuse has the ability to fill in forms, provided that the original PDF is not "optimized" or "linearized". The module documentation does describe a procedure for "concatenating the streams" of these documents, but it seems to be version-dependent.

      CAM::PDF can read optimised/linearized documents but can only write single-streamed documents, so formatting might be tricky.

      Both modules support PDF 1.4...more or less. CAM::PDF can change document permissions to some extent.

      Text::PDF can produce forms but isn't great at processing them, and isn't well documented.

      PDF::API2 took out form-fill back in 2003.

Re: Filling out PDF forms with data from DBI?
by monsterzero (Monk) on Jun 21, 2006 at 21:14 UTC

    Hi Randal,

    I knew a friend that did something similar.

    The basic steps of what you'll need to do are as follows:

    • Get the (free) Acrobat SDK: http://partners.adobe.com/asn/developer/acrosdk/main.html
    • Read the documentation in the SDK for help on creating an FDF file using HTML forms.
    • Once you create an FDF, you'll use that as the file to serve to your visitors (which will display the merged PDF).
    • The PDF template can have separate buttons for "Print Me" and "Submit Me"

    HOWEVER: if your users only have Acrobat Reader, they will not be able to save the pre-populated PDF to their computer. (if they have the full Acrobat, I believe they can).

    (Alternatively, you can pay $2k - $3k for a program called PDFMerge which will actually merge the FDF with the PDF into a file your users can save . . . .)

      The utterance,

      $ perl -MPDF::API2 -MData::Dumper -e'my $pdf = PDF::API2->open("fw9.pd +f"); my $res = PDF::API2::Resource->new_api($pdf); print Dumper($res) +'
      produces, among much else, the string:
      'This form has document rights applied to it. These rights allow anyo +ne completing this form, with the free Adobe Reader, to save their fi +lled-in form locally.'
      I have no idea whether that's really true.

      After Compline,
      Zaxo

Re: Filling out PDF forms with data from DBI?
by traveler (Parson) on Jun 21, 2006 at 21:56 UTC
    I am guessing fillpdffields.pl (part of CAM::PDF) is what you need. You may want to modify the script or use CAM::PDF directly.

    HTH, --traveler

Re: Filling out PDF forms with data from DBI?
by Anonymous Monk on Jun 22, 2006 at 00:35 UTC
    They want me to write Perl code to process the form,
    Not to troll, but do they really care that it's Perl code? I mean, if you wrote it, say, C#, would you be in breach of the contract?

        Randal,

        Just curious - did you come up with a solution for your DBI/PDF form problem?

        -MC

Re: Filling out PDF forms with data from DBI?
by saberworks (Curate) on Jun 22, 2006 at 04:21 UTC
    We tried CAM::PDF but had problems with the formatting and alignment of the filled in values. For example, there were money-formatted fields that were aligned right and prefixed with a dollar sign, and then other fields were centered. When filling them out, the resulting PDF had no money formatting and everything was aligned left. I emailed the author of CAM::PDF and did some digging, turns out it's handled by javascript or something internally and the PDF CAM::PDF writes isn't that great. I, too, was disappointed with the lack of CPAN modules for this task. Proprietary file formats suck :(
Re: Filling out PDF forms with data from DBI?
by Thelonius (Curate) on Jun 22, 2006 at 04:44 UTC
    You can easily create an FDF file with the data for the fields. For example, using your example file, you can create fw9a_data.fdf:
    %FDF-1.2 1 0 obj<</FDF<< /Fields[ <</T(c1-1)/V/Yes>> <</T(c1-2)/V/Off>> <</T(c1-3)/V/Off>> <</T(c1-4)/V/Off>> <</T(c1-5)/V/Off>> <</T(f1-1)/V(An individual name)>> <</T(f1-2)/V(Here is where the business name would go)>> <</T(f1-3)/V(Something something)>> <</T(f1-4)/V(111 somewhere st)>> <</T(f1-5)/V(Gotham City, MA)>> <</T(f1-6)/V(Requestor's name\r111 somewhere St.\rGotham City, MA)>> <</T(f1-7)/V(List_account_numbers)>> <</T(f1-8)/V(3)>> <</T(f1-9)/V(1)>> <</T(f1-10)/V(4)>> <</T(f1-11)/V(1)>> <</T(f1-12)/V(5)>> <</T(f1-13)/V(9)>> <</T(f1-14)/V(2)>> <</T(f1-15)/V(6)>> <</T(f1-16)/V(5)>> <</T(f1-17)/V(9)>> <</T(f1-18)/V(8)>> <</T(f1-19)/V(7)>> <</T(f1-20)/V(6)>> <</T(f1-21)/V(5)>> <</T(f1-22)/V(4)>> <</T(f1-23)/V(3)>> <</T(f1-24)/V(2)>> <</T(f1-26)/V(1)>> ] /F(fw9a.pdf)>>>> /ID[<B97089A55B4A6C612CDC01693E405875> <5E6E5A68A50C834B953FA17874436B87>] endobj trailer <</Root 1 0 R>> %%EOF
    Using the free Acrobat Reader, I copied fw9.pdf to fw9a.pdf, filled in a few fields, then used the menu command "Document|Forms|Export Data from Form..." to create "fw9a_data.fdf". I have thrown in a few line breaks to make it readable.

    If you give the user the two files "fw9a_data.fdf" and "fw9a.pdf", they can then open this in Acrobat or Acrobat Reader. (E.g., by double-clicking fw9a_data.fdf)

    If you want to just create a printable PDF, you could use the command-line pdftk utility.

    pdftk fw9a.pdf fill_form fw9a_data.fdf output fw9print.pdf
    It's possible that CAM::PDF may be able to do this, but I haven't gotten it to install properly. In the merged file, the form data is in there twice, once as form data and once as print instructions, e.g. for the name/address field, pdftk creates this object which will tell a PDF renderer how to print the data:
    10 0 obj <</Matrix [1 0 0 1 0 0] /Subtype /Form /Length 150 /Resources <</Font <</HeBo 2 0 R >> /ProcSet [/PDF /Text /ImageB /ImageC /ImageI] >> /FormType 1 /BBox [0 0 179.66 37] /Type /XObject >> stream /Tx BMC q 1 1 177.66 35 re W n 0 0 0.50196 rg BT /HeBo 9 Tf 10.71 TL 2 + 27.34 Td (Requestor's name)Tj (111 somewhere St.)' (Gotham City, MA) +' ET Q EMC endstream endobj

    The output from pdftk is a PDF file that is printable, but you can't edit and resave the data in the free Adobe Reader. (You might be able to edit it in full Adobe.) The IRS forms are editable in Adobe Reader because the IRS paid for an expensive Adobe product that signs the file. (See http://seclists.org/lists/vulnwatch/2003/Jan-Mar/0103.html and http://weblog.infoworld.com/udell/2003/09/04.html.)

Re: Filling out PDF forms with data from DBI?
by Moron (Curate) on Jun 22, 2006 at 09:37 UTC
    An alternative approach might be to pick an intermediary format like HTML, for which there is much more comprehensive CPAN support for reading, editing and rewriting. There are plenty of open source html <-> pdf conversion tools out there like pdftohtml to fill in the resulting backend requirement. The latter thereby no longer being dependent on what is available in CPAN.

    -M

    Free your mind

Re: Filling out PDF forms with data from DBI?
by ghenry (Vicar) on Jun 22, 2006 at 17:37 UTC

    UPDATE I forgot prField and prDocForm and prForm.

    I set up a Google Group for PDF::Reuse. Trying joining and posting there, Lars is very responsive.

    There's also more info in the Tutorial, and there was some discussion in the Group last week about mod_perl usage too.

    HTH.

    Walking the road to enlightenment... I found a penguin and a camel on the way.....
    Fancy a yourname@perl.me.uk? Just ask!!!
Re: Filling out PDF forms with data from DBI?
by hesco (Deacon) on Jun 23, 2006 at 03:55 UTC
    I asked essentially the same question a week or three ago. Bart suggested at this node in response that this was likely possible with PDF::Reuse. I spent a night or so playing with it, but was unsuccessful in getting anywhere. As this was research for a later project and not the next deadline, I left it at that. If you figure this out, please post your recipe here, I hope to get to that back burner project before the Fall.

    -- Hugh

    if( $lal && $lol ) { $life++; }
Re: Filling out PDF forms with data from DBI?
by Limbic~Region (Chancellor) on Jun 23, 2006 at 16:08 UTC
    merlyn,
    Despite the advisory that it CAM::PDF might make formatting a problem, it is the only module I could figure out how to make work. Here is the most basic example:
    #!/usr/bin/perl use strict; use warnings; use CAM::PDF; my $pdf = CAM::PDF->new('fw9.pdf') or die 'wtf'; # Use $pdf->getFormFieldList() to get list of field names # Unfortunately, fw9.pdf doesn't have descriptive names $pdf->fillFormFields('f1-1' => 'Randal Schwartz'); $pdf->cleanoutput('RSchwartz_w9.pdf');
    As noted elsewhere in this thread, if all you are modifying is field values than the listpdffields.pl and fillpdffields.pl utilities that come with CAM::PDF may come in handy.

    Note: I did have some troubles installing the module and its dependencies but assume it was due to local environment. Let me know if you have problems so I can share workarounds.

    Cheers - L~R

      It looks like L~R is right on the money with CAM::PDF. I've spent a lot of time with PDF::API2 but I think that CAM::PDF is better suited for this task. In the case of of the aforementioned W-9 form, with its non-descript field names, I wrote an inelegant-but-useful piece of code that helps to identify the actual field locations:

      #!/usr/bin/perl # pdf-filler-test.pl use strict; use warnings; use CAM::PDF; # my $infile = 'fw9.pdf'; my $outfile = 'modified_fw9.pdf'; my $pdf = CAM::PDF->new($infile) or die 'wtf'; my @FIELDS = $pdf->getFormFieldList(); # foreach ( @FIELDS ) { my $fieldnum = $_; $fieldnum =~ s/f1-//; $pdf->fillFormFields($_ => $fieldnum); } $pdf->cleanoutput($outfile);

      This revealed that each digit in the SSN and tax ID number fields is actually a form field. I would assume that Randal would need to analyze each form so as to know which fields are what, and then create the appropriate mapping from DBI fields to PDF form fields.

      On this particular form I noticed that the check boxes were not form fields. Randal, does your client require a check mark on the document, e.g. for "sole proprietor" or "Corporation?" The only reason I ask is because I don't see an easy means to add a check mark using CAM::PDF. PDF::API2 or PDF::Reuse can do that easily. However, they don't have the easy interface into forms the way CAM::PDF does. It might be a bit of a kludge, but running through CAM::PDF to fill in the forms and then running through PDF::API2 to add any checkmarks (or any kind of glyphs or graphics) would be pretty easy. To add a check mark on "Corporation" you could just do this:

      #!/usr/bin/perl use strict; use warnings; use PDF::API2; # location of "Corporation" check box is: # 173 pts from left, 660 pts from bottom my $infile = 'modified_fw9.pdf'; my $outfile = 'modified_check_corp_box_fw9.pdf'; my $pdf = PDF::API2->open($infile); my $page = $pdf->openpage('1'); my $text = $page->text(); my $font = $pdf->corefont('Helvetica'); # prepare text object $text->font($font,11); # Set font to Helvetica, 11pt $text->fillcolor("#000000"); # This is black $text->translate(173,660); # Text start location for Corp chk box $text->text("X"); # Print "X" at 173,660 $pdf->saveas($outfile); $pdf->end;

      These are just some thoughts. Please let us know what you eventually come up with. I'm curious to see how this looks in a production environment.

      -MC

Re: Filling out PDF forms with data from DBI?
by hesco (Deacon) on Jun 24, 2006 at 16:39 UTC
    mercutio_viz:

    You are a godsend. That is exactly what I've been looking for. I tried Limbic~Region's code and it worked for the IRS pdf form that braught Merlyn to SoPW, but not on my government pdf without the pdf-form.

    Your code, however showed me the way to nirvana. My project involves a thirteen page form with hundreds of fields and I suspect it may take a week or two at it to work out all the details for it. But my tinkering to determine whether it was possible to overwrite a pdf document has now come to an end. I can now move on to other aspects of that project when I get to it. Thank you so much for that quick sample.

    -- Hugh

    if( $lal && $lol ) { $life++; }

      Glad I could help. I think I'll print out your post so that I can show my wife that I'm not just totally wasting my time with "that Camel thingy" - her words, not mine!

      -MC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://556764]
Approved by Limbic~Region
Front-paged by Limbic~Region
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2014-04-21 08:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (492 votes), past polls