Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Resizing large TIF files

by grantm (Parson)
on Nov 20, 2007 at 09:12 UTC ( [id://651886]=perlquestion: print w/replies, xml ) Need Help??

grantm has asked for the wisdom of the Perl Monks concerning the following question:

I have a bunch of scanned images that people are accessing via a web app. The original images are monochrome TIF format and around 2200x1600 pixels. I need to convert them to a format that browsers can display and resize them to a range of smaller sizes. We have a working solution that converts the images on the fly and caches the results but it's a little slow so I'm looking for advice on alternative tools.

The current code calls the 'convert' program from the ImageMagick suite via the system function. It takes approximately 1.5 seconds to resize an image to a 1024x768 PNG. The originals are quite small (40KB) since they're 1 bit per pixel. The PNG is truecolour and closer to 300KB. So far I haven't found the magic combination of command-line arguments to the 'convert' program to make it produce an indexed color PNG. The large file sizes add network delays on top of the file conversion delay.

The first thing I tried was using the Image::Magick module rather than shelling out to the 'convert' command. As far as I can tell I'm using equivalent options, but the run time increases to 2.5 seconds per image and the resulting files are nearly 400KB. I've also been unable to get a conversion to indexed colour working with the module.

Next I tried the Imager module. It is able to produce indexed colour images and with a palette of 8 colours the file size is down to 70KB. Unfortunately the run time is up to 2.5 seconds and the images are rather blurry. Adding an 'unsharp' transform fixes the blur but adds another second of run time.

Finally, I tried the GD module. Unfortunately GD doesn't support TIF format so I read the source files in via a tifftopnm $in_file | pnmtopng pipeline. At 8 colour output the file size is down to a respectable 60KB and the image quality is fine but the run time is around 2 seconds per image.

The 'solution' is likely to involve pre-caching the resized images so that users don't see the delay. However, given that new images are coming in at a rate of thousands per day, reducing the conversion run time would still be useful. Can anyone suggest any other Perl modules or image conversion tools worth checking out?

Replies are listed 'Best First'.
Re: Resizing large TIF files
by dk (Chaplain) on Nov 20, 2007 at 11:00 UTC
    Out of interest, I tried to reproduce your setup, and the results are indeed interesting. The only difference is that a 2200x1600x1 TIF file I created was rather more 400K , but that's it. ImageMagick's convert command, $ convert -resize 1024x768 -depth 1 1.tif 1.png, took 0.8 seconds on my machine.

    I don't know Imager, but I tried Prima with this script, and the same conversion took 0.07 seconds. Here's the code:

    use strict; use Prima::noX11; # for cgi environment use Prima; my $i = Prima::Image-> load('1.tif'); $i-> size( 1024, 768); $i-> save('1.png');
    And the output is also 1-bit PNG.
Re: Resizing large TIF files
by igelkott (Priest) on Nov 20, 2007 at 10:20 UTC
    The large file sizes are probably due to a direct conversion from your presumably compressed tiffs. Try reducing the resolution and size, if that's appropriate for your application. Images for the web probably don't need more than 100 dpi.

    If you wish to consider offline batch processing of all images, I've used IrfanView in Windows. This can do batch conversions from a command line (and no GUI appears). Never needed to do this from Unix/Linux (or modern Mac) but Gimp or the stand-alone ImageMagick can probably help.

Re: Resizing large TIF files
by hangon (Deacon) on Nov 20, 2007 at 22:48 UTC

    Hmm, 2200 x 1600 pixels, 1 bit depth, sounds like you're dealing with faxes. My guess is that you're receiving fax images for viewing via http. Under that assumption, you do not want truecolor at all. If you use PNG, go with greyscale or a two color palette. You might also consider GIF (again with a minimal palette) and see which gives you smaller file sizes.

    To serve fax images, I would convert the image format upon receipt without resizing it. If you can keep the bit depth down the image files should not be very large. Let your visitors' web browsers do the resizing by using the height and width attributes in the IMG tags. Your server can reset these a lot faster than it can resize the images. You could also use these attributes to have a resizing function on the client side with a small bit of javascript.

    Update: Converting some TIFF fax images (avg 80k) to PNG, I get a smaller size PNG even in truecolor (avg 76k). However, by saving the PNG files uncompressed in truecolor, they are several times larger than the original TIFF (avg 460k). So my guess is that your converted files are not being compressed.

    Original 2 color TIFF: 80kb
    Convert to Trucolor PNG, compressed: 76kb
    Convert to Trucolor PNG, uncompressed: 460kb
    Convert to greyscale PNG, compressed: 58kb
    Convert to 2-color PNG, compressed: 48kb

Re: Resizing large TIF files
by sundialsvc4 (Abbot) on Nov 20, 2007 at 16:47 UTC

    When "images are coming in at the rate of thousands per day," how many of those images are being viewed? And of those images, how many images are being viewed more than once? Will the demand for images continue to remain high, or do the odds-of-viewing drop as the image becomes older?

    One simple strategy is to see if a reduced version of the image already exists. (Look in a "thumbnails" sub-directory, say...) if it does, display it. Otherwise, create a reduced image first and then display it. (Future requests for the same image will only need to display it.)

    If you expect that most of the images will be viewed eventually, then you could have a Perl program running continuously in the background, crawling through the directories looking for images that have not yet been reduced and reducing them. This process runs all day and all night, happily slurping-up excess CPU cycles at times when the computer has nothing better to do. (e.g.nice background_shrinker.pl &”)

Re: Resizing large TIF files
by mwah (Hermit) on Nov 20, 2007 at 22:54 UTC
    tiff 'convert' ... ImageMagick ... 1.5 seconds ... to a 1024x768 PNG. The originals are ... 40KB ... The PNG is truecolour and closer to 300KB ...

    I'm not sure to have understood what you did and why it didn't work. These times and sizes are not what I got here.

    For testing purpose, I screen-copied this PerlMonk web page with your question, pasted it into Photoshop, resized it to 2200x1600, converted it to bitmap and blurred it until it's monochrome tiff (lzw) had about 48KB. This file was named 'originalmono.tiff'

    Now I started cygwin and issued a

    $> for i in `seq -w 1 100`; do cp originalmono.tiff originalmono${i}.tiff; done

    which gave me 100 of these 48K images (originalmono001.tiff etc). Next, a snippet using Imager (0.61, which is what I use most of the time) was written:

    use strict; use warnings; use Imager; my $fn='originalmono'; my $img = Imager->new(); my $tdiff = time(); opendir my $dirh, '.' or die "Can't read: $!"; while( my $name = readdir $dirh ) { if( $name =~ /^$fn\d+.+tiff$/ ) { $img->read(file=>$name) or die $img->errstr(); # *** qtype=>'preview' / qtype=>'normal' my $omg = $img->scale(xpixels=>1024, ypixels=>768, type=>'min', + qtype=>'preview'); # *** file=>"$name.png" / file=>"$name.gif" $omg->write(file=>"$name.png") or die "Can't write: ", $img->er +rstr; } } $tdiff = time() - $tdiff; closedir $dirh; print "$tdiff seconds\n"

    In 'preview' scaling mode, the 100 png's are out after 21 sec (.png size: 21K), in 'normal' scaling mode (which is: "quality mode"), the 100 files are done after 98sec (.png size 48K). If you change the output format to 'gif', the 100 gifs ('normal mode') take about 100 sec, each .gif-File is 33K. BTW: This is an Athlon64/3200 under Win-XP. I re-checked the run times under Linux/vmware, they differ only by a second or two.

    Regards

    mwa

      If his original TIFF images are fax compressed it may takes a bit longer to uncompress them than to decompress LZW, which might explain some of the performance difference. I know libtiff's group 3 compression code was fairly slow way back when I dealt regularly with fax.

      You might want to try mixing scaling mode for decent performance, and I'd expect decent quality for scaling monochrome images too (Imager 0.54 +).

        Ah OK it looks like I'm using a fairly old version of Imager which doesn't support mixing mode - an upgrade is obviously in order.

        By the way Tony, thanks a lot for all the work you've put into Imager it really is an excellent module which I've put to good use on a number of projects.

      I'm not sure to have understood what you did and why it didn't work. These times and sizes are not what I got here.

      Actually, your results are entirely consistent with mine. The problem is that your code uses 'preview' quality rather than the default 'normal' or high quality scaling. If I use 'preview' quality I also can resize an image in about 0.3 seconds.

      The images I'm working with are scans of paper forms. Some of the resulting image is the text and lines printed on the form and the rest (the more interesting bit) is the handwritten text that someone wrote. In the original files, lines are 1-2 pixels thick - say 1.5 pixels thick on average. If I resize the images in 'preview' mode, the resulting files use 1bit depth so the file size is good but the image quality is poor - sections of lines and letters disappear altogether and legibility is significantly reduced. However if I use 'normal' quality, there is barely any degradation in legibility. The resulting files essentially end up anti-aliased which necessarily uses more shades of gray so the file size increases. Obviously the extra quality also comes at a significant CPU cost (about a factor of 10 over preview quality).

      So the bit that I obviously failed to make clear was that I'm not trying to produce thumbnails or previews. I want to reduce the images so that they display on screen at about the same size as the physical form and retain sufficient legibility that users never need to refer to the full size originals. The originals appear too big and 'zoomed' for practical purposes. All the resizing techniques that I've tried do meet these legibility requirements (with the exception of Imager's preview mode). I've also determined that reducing from 24bit RGB to 8 colour (3 bit) indexed colour barely affects legibility at all. 2 bit colour is border line (might be OK if I could preselect the palette) and 1 bit colour is not good enough. It's the processing overhead and file bloat that I'm keen to reduce if possible.

      Thanks for taking the time to try it out and reply.

        So the bit that I obviously failed to make clear was that I'm not trying to produce thumbnails or previews

        Yes you didn't - and that changes the problem entirely. There are a number of pixel-interpolating algorithms, and the fastest is "nearest neighbor", or Imager::preview in your terms. A good visual balance for image downscaling is usually produced by the bicubic algorithm, but Image::Magick has also qw(Quadratic Triangle Hermite Hanning Hamming Blackman Gaussian Catrom Mitchell Lanczos Bessel Sinc) (at least those I know), and you might want to try which works best for you.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://651886]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-06-23 17:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.