Your code assumes getPageText() returns an empty string when there are no text blocks in the PDF. This is probably an incorrect assumption. In general, a function in list context could be returning a false (-1), an undef or a string with whitespace. (tab, cr, etc). Try this:
{
my $foo = $doc->getPageText($_) ;
print $_ unless (defined $foo && # Returned something and,
$foo =~ m/[[:alnum:]]+/ms ); # actually returned text
}
Sorry, I didn't actually test this.
update: fixed that dratted ~=/=~
update: fixed regex, tested now.
s//----->\t/;$~="JAPH";s//\r<$~~/;{s|~$~-|-~$~|||s
|-$~~|$~~-|||s,<$~~,<~$~,,s,~$~>,$~~>,,
$|=1,select$,,$,,$,,1e-1;print;redo}