Find blank pages in PDF

Samy_rio
Hi monks, I need to find the blank pages in PDF. I did super search, but I didn't got any links about this.

I tried in CAM::PDF, in the given pdf file the page is blank but it may contain header information in each page. That is,

03371 _ FM _ i -xv i .qx d 6/28/06 7 : 31 PM P age i

The following code is not displaying blank pages.

use CAM::PDF; my $doc = CAM::PDF->new($ARGV[0]) || die "$CAM::PDF::errstr\n"; my $pages = $doc->numPages(); print $pages; for (1..$pages) { print $_ if ($doc->getPageText($_) eq ''); }

Please suggest me how to find the blank page in PDF?

Velusamy R.

Re: Find blank pages in PDF
marto

    Check out CAM::PDF::PageText "Turn a page content tree into a string", which may be what you need to determin if the page has anything on it. Sadly I can not test this for you at the moment due to being at work :(

    Hope this helps

Re: Find blank pages in PDF
starbolin

    Your code assumes getPageText() returns an empty string when there are no text blocks in the PDF. This is probably an incorrect assumption. In general, a function in list context could be returning a false (-1), an undef or a string with whitespace. (tab, cr, etc). Try this:

    { my $foo = $doc->getPageText($_) ; print $_ unless (defined $foo && # Returned something and, $foo =~ m/[[:alnum:]]+/ms ); # actually returned text }

    Sorry, I didn't actually test this.

    update: fixed that dratted ~=/=~ update: fixed regex, tested now.

