With the help of Alan Fry I could manage to get a fast solution like this
sub gettitle {
use Fcntl;
my $file = shift;
local *IN;
sysopen( IN, $file, O_RDONLY, 0 )
or die "while reading: '$file'\n";
read IN, my ($str), -s $file;
close IN;
my ($info_block) = ( $str =~ /\/Info\s(\d+)\s0\sR/ )
or die "cannot get /Info paragraph\n";
my $searchpos = -1;
my $info_start;
while (1) {
$info_start =
index( $str, "$info_block 0 obj",
$searchpos + 1 );
die "cannot get position of '$info_block 0 obj'\n"
if $info_start < $searchpos + 1;
last
if (
substr( $str, $info_start - 1, 1 ) =~
/\015|\012/ );
$searchpos = $info_start;
}
my $info_obj = substr( $str, $info_start,
index( $str, ">>", $info_start ) - $info_start +
2 );
my ($title) =
( $info_obj =~
/\/Title\s*\( ([^\015\012|\015|\012]*) \) /x )
or return 'undefined';
return $title;
}
I furthermore compared the performance of the above solution with
Text::PDF and PDF-111 from CPAN. The test set consisted of 36 PDF files
summing up to 3.8 MB.
runtime ratios of
index-solution-from-above : Text::PDF methods : PDF-111
were:
1 : 6 : 12
PDF-111 from CPAN has other flaws too. The author didn't respond to my questions.
IMHO it should be dumped. It has a far to promiment place in the module hierarchy.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.