comment on

With the help of Alan Fry I could manage to get a fast solution like this

sub gettitle {
    use Fcntl;
    my $file = shift;
    local *IN;
    sysopen( IN, $file, O_RDONLY, 0 )
      or die "while reading: '$file'\n";
    read IN, my ($str), -s $file;
    close IN;

    my ($info_block) = ( $str =~ /\/Info\s(\d+)\s0\sR/ )
      or die "cannot get /Info paragraph\n";
    my $searchpos = -1;
    my $info_start;
    while (1) {
        $info_start =
          index( $str, "$info_block 0 obj",
            $searchpos + 1 );
        die "cannot get position of '$info_block 0 obj'\n"
          if $info_start < $searchpos + 1;
        last
          if (
            substr( $str, $info_start - 1, 1 ) =~
            /\015|\012/ );
        $searchpos = $info_start;
    }
    my $info_obj = substr( $str, $info_start,
        index( $str, ">>", $info_start ) - $info_start +
          2 );
    my ($title) =
      ( $info_obj =~
          /\/Title\s*\(  ([^\015\012|\015|\012]*)  \)  /x )
      or return 'undefined';
    return $title;
}
[download]

I furthermore compared the performance of the above solution with Text::PDF and PDF-111 from CPAN. The test set consisted of 36 PDF files summing up to 3.8 MB.

runtime ratios of
index-solution-from-above : Text::PDF methods : PDF-111
were:
1 : 6 : 12

PDF-111 from CPAN has other flaws too. The author didn't respond to my questions. IMHO it should be dumped. It has a far to promiment place in the module hierarchy.

In reply to Re: Re: PDF GetInfo( by axelrose
in thread PDF GetInfo( by axelrose

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Think about Loose Coupling
	PerlMonks