Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Seeking Comments & Feedback on Word/PDF project

by jedikaiti (Hermit)
on Mar 17, 2010 at 19:19 UTC ( #829252=perlquestion: print w/replies, xml ) Need Help??
jedikaiti has asked for the wisdom of the Perl Monks concerning the following question:

So I have this project for work, and I am trying to decide which approach might be better. Any comments or feedback you can offer is much appreciated.

At work, there are some existing Perl scripts which dump a ton of data out of a database, add in some pre-written text, and compiles it into a handbook (850+ pages) in LaTeX and spits out a PDF. The goal here is to eliminate LaTeX from the process.

I had originally planned to make my modifications spit out a Word 2007 .docx file, which could then be easily modified as needed and saved to a PDF if desired. I've been playing around with the Template Toolkit module, and have confirmed that this is something I should be able to do without too many headaches. Also, the organization here has a standard Word template for engineering docs like this. I'm sure I could re-create it in a PDF, but I already have it in Word.

The other option is to remove the middleman and just go straight to PDF. To this end, I've been researching PDF modules on CPAN, and so far PDF::Create is looking like a good option. PDF::API2 was looking good (or at least powerful), but the lack of any real documentation is quite the turnoff. If I go this route, whatever PDF module I use needs to support bookmarks - in a document this big (850+ pages), there's a ton of them.

So my question to you, dear Monks: if this project were deposited on your desk, which path would you choose? What do you see as being the pros/cons of each? Anything else I haven't mentioned that you think I should not overlook?

Thanks a million!

  • Comment on Seeking Comments & Feedback on Word/PDF project

Replies are listed 'Best First'.
Re: Seeking Comments & Feedback on Word/PDF project
by CountZero (Bishop) on Mar 17, 2010 at 19:52 UTC
    The goal here is to eliminate LaTeX from the process.
    Don't do it. LaTeX takes all the pain out of lay-outing.

    If you go the Word-way you will have to do the lay-outing yourself and you will be fighting Word's "good intentions" do to things its own way, every inch of the road.

    "Printing" straight to PDF is a total nightmare, you will not only have to invent your own layout, but you will have to place all PDF's elements pixel-perfect on every page, calculate the size of all paragraphs, do your own page-setting, splitting paragraphs so they fit on the page, ...

    Invest some time in learning a modern LaTeX lay-out package, such as memoir (link to the memoir manual in PDF).

    I have been using this for several years now and it works like a charm. The data goes from the database into a Perl program which invokes a Template Toolkit template to write the LaTeX-file which is then handed over to latexmk which runs LaTeX the required number of times to resolve all references and turn it into a beautiful PDF.

    Believe me, all my previous "solutions" never worked as smoothly as this and I have high level control at all stages of the process.

    Update: Almost forgot, bookmarks to all chapters, sections, subsections, references, tables, figures, footnotes, bibliographic items, ... come for free if you use LaTeX and the hyperref package: see this.


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      When I talk about using Word, I mean using Template Toolkit to generate a few XML files that get zipped into a Word .docx file.

      If I kept LaTeX, I would still have to fight it to make it look like the existing Word template (the existing scripts use an outdated format).

      I also have no choice in keeping LaTeX - nobody on my team wants anything to do with it, and being the temp, even if I did want it, I couldn't keep it.


        It appears these folks like Word, have a standard template that will set margins, and paragraph formats, fonts and all that stuff. I would be thinking along the lines of using Win32::OLE and control WinWord from Perl to insert your data into their template. The result will be a standard WinWord doc that you can direct Word to print as .pdf, XML or whatever. Trying to create a Winword doc yourself is a nightmare as will creating a .pdf that will have same look as their document format.

        I don't know complex this data/project is, but in the past I had one project where I wrote a Word macro that imported my data file, whopped on it and produced a report. The pre-processing was just enough for this Word macro to do its job. So it wasn't necessary for me to write Perl to control Word, just make a file simple enough that my Word Macro could use. Just an idea to consider...

Re: Seeking Comments & Feedback on Word/PDF project
by almut (Canon) on Mar 17, 2010 at 19:27 UTC
    which could then be easily modified as needed

    Just be aware that this no longer would be a real option when you go creating PDF directly.  It's a format intended primarily for viewing/printing (kind of the "end product" of a document processing tool chain).

      Very true. Fortunately, I don't think that's a major concern, just a minor one. Need to chat with the boss-critters about that further Kaiti

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://829252]
Approved by almut
Front-paged by ww
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2018-04-22 05:15 GMT
Find Nodes?
    Voting Booth?