Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Multipage TIFFs

by oligmd (Sexton)
on Nov 01, 2007 at 17:37 UTC ( #648523=perlquestion: print w/replies, xml ) Need Help??

oligmd has asked for the wisdom of the Perl Monks concerning the following question:

I have been working on a script which looks for gaps a in numbered series of TIFF files. In order for this to work, I need to be able to identify how many pages there are in any given TIFF. This is the code I'm using:
sub pageCount { my $FILE = shift; my $IMAGE = Image::Magick -> new(); $IMAGE -> Read( $FILE ); my $PAGECOUNT = $#$IMAGE+1; undef $IMAGE; return( $PAGECOUNT ); }
My problem is that this method forces me to read each TIFF file. These TIFF files are large enough that this subroutine is too slow to be useful. Is anyone aware of a different, i.e. faster, way to obtain a page count on a multipage image?

Replies are listed 'Best First'.
Re: Multipage TIFFs
by KurtSchwind (Chaplain) on Nov 01, 2007 at 19:27 UTC
    Instead of reading the entire file, you should read just the TIFF header in each file. I'm not certain off hand if that's supported in Image::Magick, but you can get head start on reading the header here. In essense, you need to scan for the tags in the header which indicate the image boundries. Reading just the header should be VERY fast.
    I used to drive a Heisenbergmobile, but everyone I looked at the speedometer, I got lost.
      There is an ImageMagick command called "identify" which will read the image just deeply enough to give answers. They call this "pinging" the file, as opposed to reading it completely. I am pretty sure the number of pages or sub-images is reported by this process.

      [ e d @ h a l l e y . c c ]

      I played around with Image::ExifTool and had a realization. When I printed all of the tags from a TIFF using Image::ExifTool, it didn't return any tags which explicitly contained the number of pages in the TIFF. It did, however, return tags from each of the pages past page 1 in a "$tag ($page-1)" format.

      I wrote this ugly piece of code which does the job perfectly:
      use Image::ExifTool; sub pageCount { my $FILE = shift; my $EXIFTOOL = new Image::ExifTool; my $INFO = $EXIFTOOL -> ImageInfo( $FILE ); my $TAG; my $DONE = 0; my $PAGECOUNT = "1"; while ( $DONE == 0 ) { $TAG = "ImageWidth ($PAGECOUNT)"; my $VAL = $EXIFTOOL -> GetValue( $TAG ); if ( !$VAL ) { $DONE = 1; } else { $PAGECOUNT++; } } return( $PAGECOUNT ); }
      In essence, I keep looking for pages until GetValue returns undefined. It's far from a great solution, but it works.

      Thanks for pointing me in the right direction!

        I see you solved the problem, but since I already wrote this I'll post it for anyone who can use it. The program traverses the tiff file's image directory chain. Testing on an old 900MHz windoz box, it ripped through 1200 tiff files, over 2 GB, in 23 seconds.

        use strict; my @files = glob "D:/pics/tiff/*.tif"; for (@files){ my $images = imagecounter($_); printf "%40s : %3d \n", $_, $images; } sub imagecounter{ my $tiff = shift; my @endian; my $offset = 0; my $index; my $count = 0; open (my $fh, "<", $tiff); binmode $fh; seek ($fh, $offset, 0); read ($fh, $endian[0], 2); $endian[0] = unpack "a2", $endian[0]; if ($endian[0] eq 'II'){ $endian[1] = 'v'; $endian[2] = 'V'; }elsif ($endian[0] eq 'MM'){ $endian[1] = 'n'; $endian[2] = 'N'; }else{ warn "Unknown byte order in $tiff, possible bad file\n"; return 0; } seek ($fh, 4, 0); read ($fh, $offset, 4); $offset = unpack "$endian[2]", $offset; while ($offset){ $count++; my $length; seek ($fh, $offset, 0); read ($fh, $length, 2); $length = unpack "$endian[1]", $length; $index = ($length * 12); seek ($fh, $index, 1); read ($fh, $offset, 4); $offset = unpack "$endian[2]", $offset; if ($count > 100){ warn "Count too high in $tiff, possible bad file\n"; return 0; } } close $fh; return $count; }
Re: Multipage TIFFs
by hangon (Deacon) on Nov 01, 2007 at 19:35 UTC

    Unless someone knows of a more suitable module, you can try counting the IFD's (image file directories) directly. Here's some pseudocode to get you started. Sorry, I dont have time for more right now. You can google the TIFF spec for more info.

    read bytes 0,1 # Byte ordering: 4949 = intel, 4D4D = motorola $imagecount = 0 $offset = read bytes 4-7 # first image directory while $offset != 0 $ifdlength = read first 2 bytes at $offset and multiply by 12 $imagecount++ $index = $offset + $ifdlength + 2 $offset = read 4 bytes at $index

    Update: math correction

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://648523]
Approved by andreas1234567
Front-paged by Argel
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2021-06-24 16:37 GMT
Find Nodes?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)

    Results (130 votes). Check out past polls.