http://www.perlmonks.org?node_id=430761

mlh2003 has asked for the wisdom of the Perl Monks concerning the following question:

Hello exalted monks,

I have a problem that I cannot seem to find within the monastery nor 'out there in Google-land'.

I have been looking at converting Word documents to PDF format using perl, and soon realised that I hit my programming and time limits and thought of converting the Word .doc to .rtf first. Then I could use a couple of modules available from CPAN. One to convert from rtf to HTML (RTF::HTML::Converter) and then take the HTML file and put it through a HTML to PDF conversion module (PDF::FromHTML). Kind of like putting a cow through 2 black boxes and getting barbeque steak at the end...

The only problem is that the first black box (RTF::HTML::Converter) seems to hang the server and returns nothing unless the .rtf file is very simple. My code that attempts the conversion follows, which uses the modules in a way that closely resembles the usage in their respective documents:

#!/usr/bin/perl use strict; use warnings; use CGI; use CGI::Carp qw(fatalsToBrowser); my $q=CGI->new; use RTF::HTML::Converter; use PDF::FromHTML; print $q->header; print $q->start_html; my $base_directory = '.'; my $base_filename = 'text_only1'; my $rtf_file = "$base_directory/$base_filename" . '.rtf'; my $html_file = "$base_directory/$base_filename" . '.html'; my $pdf_file = "$base_directory/$base_filename" . '.pdf'; open (RTF_FILE, "< $rtf_file") || die "Couldn't open RTF file: $!"; open (HTML_FILE, "> $html_file") || die "Couldn't open HTML file: $!"; # Convert the rtf file to HTML format my $file = RTF::HTML::Converter->new(output => \*HTML_FILE); $file->parse_stream( \*RTF_FILE ) || die "Error converting RTF to HTML +: $!"; close RTF_FILE; close HTML_FILE; print "Converted RTF to HTML.<br />\n"; # Convert the HTML file to PDF format my $pdf = PDF::FromHTML->new( encoding => 'utf-8' ); $pdf->load_file($html_file); $pdf->convert( Font => '/path/to/font.ttf', LineHeight => 10, Landscape => 0, ); $pdf->write_file($pdf_file); print "Converted HTML to PDF.<br />\n"; print $q->end_html;
Have any monks here experienced this behaviour with that module, or even walked a different path to start with RTF and arrive at HTML (or better yet, PDF)?

Any help would be greatly appreciated.

mlh2003

Replies are listed 'Best First'.
Re: Converting RTF documents to PDF format
by holli (Abbot) on Feb 14, 2005 at 13:43 UTC
    You could set up a Redmon printer, and convert your file to postcript. From there use ps2pdf to convert the postscript to pdf.
    Alternativly check out this site.

    This is a just a bit of the possibilities you find when you ask Google

    Happy searching!


    holli, /regexed monk/
Re: Converting RTF documents to PDF format
by Brutha (Friar) on Feb 14, 2005 at 14:05 UTC
    I know there is something like rtf2latex. Latex has no problems to output pdf and other formats in high quality. And this should be easily scriptable, but I do not know, wether this is within your performance constraints.

    And it came to pass that in time the Great God Om spake unto Brutha, the Chosen One: "Psst!"
    (Terry Pratchett, Small Gods)

Re: Converting RTF documents to PDF format
by jplindstrom (Monsignor) on Feb 14, 2005 at 12:50 UTC
    I would try to automate saving/printing Word documents from Word to PDF directly. Adobe's Acrobat Writer or possibly http://www.primopdf.com/ (haven't tried it, just Googled it) does that for you.

    /J

      Thank you jplindstrom.

      I was thinking more along the lines of being able to set up an application where a user can click on a document (Word format for instance) on their computer to upload it to the server. After uploading, convert it to PDF on the server (in an automated fashion so I don't have to manually go through the uploaded files each day and convert them to PDF).

        In theory, the OpenOffice.org suite of programs can do that, and in theory, these programs are also automatable. I say in theory because these programs need to be scripted either in their own, weird StarBASIC programming language or through an object model that is weird and unusable even for the standards of Java Object Model Designers. It would be a project producing many thanks if somebody made the filters of OOo into stand-alone converter programs, but the OOo build process is rather unapproachable.

        Although OpenOffice.org is hard to work with, if you're putting together a static task, then I think you should look at that option. Then, if you're really nice, you could wrap that task in a Perl module and upload it to CPAN. :-)

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.