Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Fetching an Image from HTTP

by Anonymous Monk
on Jun 07, 2002 at 14:25 UTC ( #172531=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Good Day Monks,

For some scripts I'm working on, I need to know how to grab an image from an HTTP site and write it to a file. Most of the images I'm working with are GIFs, but some might end up being JPEG or possibly (it's a stretch) PNG. I'd appreciate any advice you have to offer!

Replies are listed 'Best First'.
Re: Fetching an Image from HTTP
by Molt (Chaplain) on Jun 07, 2002 at 14:29 UTC

    Have a look at LWP::Simple. This'll let you get the contents of any URL quickly and with minimal fuss, and from that you simply just open the file and write it.

    There, that wasn't too painful..

      If you're just going to be fetching images, I would use Image::Grab instead of LWP::Simple. Of course, TMTOWTDI, and this way would be mine. =)

      Here's some example code. There's more in the POD documentation, of course.
      use Image::Grab; $pic->url('http://www.example.com/someimage.jpg') $pic->grab; open(IMAGE, ">image.jpg") || die"image.jpg: $!"; binmode IMAGE; # for MSDOS derivations. print IMAGE $pic->image; close IMAGE;

      It also supports a regex feature, which would be handy if you are unsure of the file extension of the image you're grabbing.

      You can isntruct it to search a paticular document on a website, and it will go through all IMG tags to find an image matching your regex. It will then request it using the document's URL as it's referrer.

      Something like would look for all .png images, but of course you can change this to match a filename you don't know the extension of. Could be handy for documents that change the types of images they use, for some bizarre reason. =)

      $pic = Image::Grab->new(SEARCH_URL=>'http://localhost/gallery.html', REGEXP =>'.*\.png');

      - wil

      that was my first tack, and I thought I was on the right track, but what I ended up with using getstore() was files that kind of thought they were jpg's and kind of thought they were html docs. Here's the script I used:

      #!/usr/bin/perl -w use strict; use LWP::Simple; open FILE, "text1.txt" or die $!; my $url; my $text; while (<FILE>) { $text = $_; $url = 'http://www.nobeliefs.com/nazis/' . $text; $text =~ s#images/##; print "$url\n"; print "$text\n"; getstore($url, $text) or die "Can't download: $@\n"; }

      an ls command shows question marks:

      $ ls ... prayingHitler.jpg? PraysingCelebration.jpg? priests-salute.jpg? received.jpg reichchurch.gif? ...

      and when I open up a jpg it looks like this:

      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http:/ +/www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html> <head> <meta http-equiv="Content-type" content="text/html; charset=utf-8"> <title>Website Moved</title> <style type="text/css"> .statusBox { width: 80px; } .fb { width:43%; float:left; text-align:center; margin:5px 20px 5px 20px; padding:20px 0 20px 0px; background:#eef8fd; height:110px; border:solid 1px #dff4fe; } .fb2 { width:43%; float:right; text-align:center; margin:5px 20px 5px 20px; padding:20px 0 20px 0px; background:#eef8fd; height:110px; border:solid 1px #dff4fe; ...

      I think the trick might be to find a way to define $params such that this works, but I haven't been able to do that yet. (I only get errors)

      my $data = LWP::Simple::get $params{URL}; my $filename = "image.jpg"; open (FH, ">$filename"); binmode (FH); print FH $data; close (FH);

        Since you're reading your URLs from a text file, each one has a newline on the end of it. There may be other problems with them. So you're requesting bad URLs from the server, and it's sending back an information page to tell you that, hence the "Website Moved" title of the HTML page you're getting back. Load the page you get back in a web browser (you might want to rename it to something.html first) to see what it's trying to tell you. (The same newline issue will cause weirdness with the local filenames you're saving to as well.)

        Inspect the actual URL you're requesting, right before requesting it, with a line like the following, and you should see the problem:

        print qq[ '$url' ];

        Aaron B.
        My Woefully Neglected Blog, where I occasionally mention Perl.

Re: Fetching an Image from HTTP
by Dog and Pony (Priest) on Jun 07, 2002 at 14:32 UTC
    > perl -MLWP::Simple -e "getstore 'http://site.com/image.gif', 'image. +gif'"
    Check out LWP::Simple for this, and for more check out my own Getting more out of LWP::Simple (shameless plug).
    You have moved into a dark place.
    It is pitch black. You are likely to be eaten by a grue.
Re: Fetching an Image from HTTP
by silent11 (Vicar) on Jun 07, 2002 at 14:41 UTC
    This is very simple example, hopefully it is enough to get you going in the right direction.
    use LWP::Simple; my $fileIWantToDownload = 'http://perlmonks.com/images/blueperlmonkssm +.gif'; my $fileIWantToSaveAs = 'monk_image.gif'; getstore($fileIWantToDownload, $fileIWantToSaveAs);
    -Silent11
Re: Fetching an Image from HTTP
by cfreak (Chaplain) on Jun 07, 2002 at 19:23 UTC

    Several people I've noticed have mentioned LWP::Simple for getting the image which is perfect. When you get the image a great way to tell what its type is Image::Size. It will give you the type and diminsions of the Image. Its really easy to use and it supports a ton of image types.

    Hope that helps
    Chris

    Some clever or funny quote here.
Re: Fetching an Image from HTTP
by Popcorn Dave (Abbot) on Jun 07, 2002 at 18:37 UTC
    LWP::Simple is definitely an easy way to go. I am using that in a project to grab news off foreign web sites that don't have RSS set up.

    As far as grabbing pictures, I have found that LWP::Simple will get them as it apparently ( and someone please correct me if I'm wrong here ) grabs the entire page, code and graphics.

    If you're doing this on a windows machine, those pictures you want may already be in your temporary directories and you might consider trying to get them out of there as another option. I don't know if *nix does the same thing or not but it may be another option to look at.

    Btw, how are you deciding which ones to get?

    You might also check out the latest issue of 2600 magazine. There was an article on grabbing pictures from websites that have the right click disabled - written in Perl.

    Hope that helps!

    Some people fall from grace. I prefer a running start...

      As far as grabbing pictures, I have found that LWP::Simple will get them as it apparently ( and someone please correct me if I'm wrong here ) grabs the entire page, code and graphics.
      I will correct you on this one. :)

      LWP::Simple does *not* in itself grab it all. If you point it at a HTML page, it will grab the HTML. If you point it directly at an image, it will grab the image. The HTML may contain image tags (something like: <img src="image.gif" />), and browsers will automatically follow the urls given there and fetch the images too. LWP::Simple will not - if that is the way you want it, you will need to parse the HTML for those URL:s and continue to fetch. Or you use an already invented wheel, like Image::Grab suggested by wil above. :)


      You have moved into a dark place.
      It is pitch black. You are likely to be eaten by a grue.
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://172531]
Approved by samtregar
Front-paged by wil
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2020-04-05 08:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The most amusing oxymoron is:
















    Results (33 votes). Check out past polls.

    Notices?