Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
go ahead... be a heretic
 
PerlMonks  

Retrieve the PDF file description

by Punitha (Priest)
on Apr 29, 2006 at 05:22 UTC ( #546470=perlquestion: print w/ replies, xml ) Need Help??
Punitha has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I need to take PDF file description and validate it. I tried this,

use strict; use warnings; use PDF; my $pdf = PDF->new("0001.pdf"); my $version = $pdf->Version; my $title = $pdf->GetInfo("Title"); my $author = $pdf->GetInfo("Author"); print "$version\n$title\n$author\n";

Some of the PDF files, it works fine. For many PDF files it throws an error as:

Premature end of file reached at valid.pl line 5 Bad object reference '' at valid.pl line 5 Bad object reference '' at valid.pl line 5 Bad object reference '' at valid.pl line 5 Use of uninitialized value in concatenation (.) or string at valid.pl +line 9. Use of uninitialized value in concatenation (.) or string at valid.pl +line 9. 1.6

I don't know why this is happening and how to rectify this. Can any one help me.

Thanks in advance.

Punitha

Comment on Retrieve the PDF file description
Select or Download Code
Re: Retrieve the PDF file description
by marto (Chancellor) on Apr 29, 2006 at 11:13 UTC
    Hi Punitha,

    I tested this code firstly with a PDF file I knew had the information you are looking for, and your script ran properly, printing the expected results. Then I created a document using Open Office Writer, exported it to PDF (In the 'PDF Options' tab I unchecked the 'Tagged PDF' and 'Export Notes' options) and experienced the errors you mention.
    Use of uninitialized value in concatenation (.) or string at pdf.pl li +ne 9. Use of uninitialized value in concatenation (.) or string at pdf.pl li +ne 9. 1.4

    Then I opened the PDF and displayed the 'Document Properties', only the PDF version and Producer tags have values, hence the 1.4 printing out. I think then that you can assume your problem occurs when these values are not populated either in the source document before conversion, or later in Acrobat writer.
    If you want to catch these errors take a look at this basic example:
    #!/usr/bin/perl use strict; use warnings; use PDF; my $pdf = PDF->new("003.pdf"); my $version = $pdf->Version; my $title = $pdf->GetInfo("Title"); my $author = $pdf->GetInfo("Author"); if ($version){ print "\nVersion: $version"; }else{ print "\nVersion: Undefined"; } if ($title){ print "\nTitle: $title"; }else{ print "\nTitle: Undefined"; } if ($author){ print "\nAuthor: $author\n"; }else{ print "\nAuthor: Undefined\n"; }


    Hope this helps.

    Martin

      Thank you for your comment.

      But in my PDF file all the Description fields (Title, Author,Description etc.) has value. But the PDF Producer is Acrobat Distiller 7.0 not the Acrobat writer.

      I am not sure whether thats the problem. And also i want to retrieve the page size value, for this i added these lines and tried

      my ($startx,$starty, $endx,$endy) = $pdf->PageSize (1) ; print "$startx\t$starty\t$endx\t$endy\n";

      These lines also not working for the same PDF files.But for some PDF file it prints the value as

      0 0 612 792

      But the page size value in the description is

      8.50 x 11.00 in

      I want the same value as output (in inches). Can anyone comment me in this and whether i am going in right direction or show some other way

      Thank you once again and in advance

      Punitha

        Punitha,

        Firstly, I am not fully understanding what you are trying to say. Have you looked at your end result PDF in acrobat reader and checked the properties to see if these fields are populated?

        Secondly, the coded you added to find out the sizes is working for you. You need to divide the sizes by 72 (72 points = 1 inch) to convert the value to inches.
        So 612 / 72 = 8.5 and 792 / 72 = 11.

        Hope this helps.

        Martin
Re: Retrieve the PDF file description
by bowei_99 (Friar) on Apr 29, 2006 at 16:45 UTC
    Have you tried (in Acrobat Distiller 7) checking if the equivalent of doing this, as implied by marto - In the 'PDF Options' tab check the 'Tagged PDF' and 'Export Notes' options? In other words, there should be something in the help menu regarding something along the lines of tagging the pdf and notes about the pdf. My guess is that Acrobat Distiller 7 may not do one or both by default; you may have to check the option(s).

    In addition, you might try the following:

    1. Look at the contents of the PDF object created:

      my $pdf = PDF->new("0001.pdf"); print Dumper($pdf);

    2. Setting the verbosity to 1, to see any other messages as to what the script is doing:
      $PDF::Verbose = 1;

    -- Burvil

Re: Retrieve the PDF file description
by Anonymous Monk on Dec 23, 2012 at 19:41 UTC

    None of the replies to this question have addressed the error:

        Premature end of file reached at valid.pl line 5

    If line 5, viz.

        my $pdf = PDF->new("0001.pdf");

    evokes an error, it seems unlikely the rest of the code will work anyway, so I'd like to know why line 5 fails because I'm seeing the same problem.

      marto answered it, the pdf lacks the information.

      That could be due to the fact that PDF doesn't seem to be a widely-used module. If you really want to use that module, then after reading the thread, I'd suggest dumping a PDF that works and compare it to one that doesn't. Perhaps the problem is that one has a different encoding for the tags or some such.

      If you don't mind switching to a different module, here's one that gets the information using PDF::API2:

      #!/usr/bin/perl use strict; use warnings; use PDF::API2; use Data::Dumper; my $FName = shift // die "Missing filename!"; my $pdf = PDF::API2->open($FName) or die "Can't open PDF file $FName: +$!"; my %pdfinfo = $pdf->info; print "Author is: ", $pdfinfo{Author}, "\n"; print "Title is: ", $pdfinfo{Title}, "\n"; print "\n\nAll info tags:\n", Dumper(\%pdfinfo);

      If I run it on the first handy PDF file on my desktop, I get:

      Author is: Texas Instruments, Incorporated [SNAS033,D ] Title is: LM4873   Dual 2.1W Audio Amplifier Plus Stereo Headpho +ne Function (Rev. D) All info tags: $VAR1 = { 'ModDate' => 'D:20121201221441-06\'00\'', 'Subject' => 'Data Sheet', 'Creator' => 'TopLeaf 7.6.028', 'Title' => 'LM4873   Dual 2.1W Audio Amplifier Plus Ste +reo Headphone Function (Rev. D)', 'Keywords' => ', SNAS033,SNAS033D', 'CreationDate' => 'D:20121201221441-06\'00\'', 'Producer' => 'iText 2.1.7 by 1T3XT', 'Author' => 'Texas Instruments, Incorporated [SNAS033,D ]' };

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

      That module has been untouched since  13 Feb 2000 , switch to CAM::PDF , it has many many many more features

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://546470]
Approved by Samy_rio
Front-paged by ff
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2014-04-19 00:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (474 votes), past polls