Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Fetching an Image from HTTP

by Anonymous Monk
on Jun 07, 2002 at 14:25 UTC ( #172531=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Good Day Monks,

For some scripts I'm working on, I need to know how to grab an image from an HTTP site and write it to a file. Most of the images I'm working with are GIFs, but some might end up being JPEG or possibly (it's a stretch) PNG. I'd appreciate any advice you have to offer!

Replies are listed 'Best First'.
Re: Fetching an Image from HTTP
by Molt (Chaplain) on Jun 07, 2002 at 14:29 UTC

    Have a look at LWP::Simple. This'll let you get the contents of any URL quickly and with minimal fuss, and from that you simply just open the file and write it.

    There, that wasn't too painful..

      If you're just going to be fetching images, I would use Image::Grab instead of LWP::Simple. Of course, TMTOWTDI, and this way would be mine. =)

      Here's some example code. There's more in the POD documentation, of course.
      use Image::Grab; $pic->url('') $pic->grab; open(IMAGE, ">image.jpg") || die"image.jpg: $!"; binmode IMAGE; # for MSDOS derivations. print IMAGE $pic->image; close IMAGE;

      It also supports a regex feature, which would be handy if you are unsure of the file extension of the image you're grabbing.

      You can isntruct it to search a paticular document on a website, and it will go through all IMG tags to find an image matching your regex. It will then request it using the document's URL as it's referrer.

      Something like would look for all .png images, but of course you can change this to match a filename you don't know the extension of. Could be handy for documents that change the types of images they use, for some bizarre reason. =)

      $pic = Image::Grab->new(SEARCH_URL=>'http://localhost/gallery.html', REGEXP =>'.*\.png');

      - wil

      that was my first tack, and I thought I was on the right track, but what I ended up with using getstore() was files that kind of thought they were jpg's and kind of thought they were html docs. Here's the script I used:

      #!/usr/bin/perl -w use strict; use LWP::Simple; open FILE, "text1.txt" or die $!; my $url; my $text; while (<FILE>) { $text = $_; $url = '' . $text; $text =~ s#images/##; print "$url\n"; print "$text\n"; getstore($url, $text) or die "Can't download: $@\n"; }

      an ls command shows question marks:

      $ ls ... prayingHitler.jpg? PraysingCelebration.jpg? priests-salute.jpg? received.jpg reichchurch.gif? ...

      and when I open up a jpg it looks like this:

      <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http:/ +/"> <html> <head> <meta http-equiv="Content-type" content="text/html; charset=utf-8"> <title>Website Moved</title> <style type="text/css"> .statusBox { width: 80px; } .fb { width:43%; float:left; text-align:center; margin:5px 20px 5px 20px; padding:20px 0 20px 0px; background:#eef8fd; height:110px; border:solid 1px #dff4fe; } .fb2 { width:43%; float:right; text-align:center; margin:5px 20px 5px 20px; padding:20px 0 20px 0px; background:#eef8fd; height:110px; border:solid 1px #dff4fe; ...

      I think the trick might be to find a way to define $params such that this works, but I haven't been able to do that yet. (I only get errors)

      my $data = LWP::Simple::get $params{URL}; my $filename = "image.jpg"; open (FH, ">$filename"); binmode (FH); print FH $data; close (FH);

        Since you're reading your URLs from a text file, each one has a newline on the end of it. There may be other problems with them. So you're requesting bad URLs from the server, and it's sending back an information page to tell you that, hence the "Website Moved" title of the HTML page you're getting back. Load the page you get back in a web browser (you might want to rename it to something.html first) to see what it's trying to tell you. (The same newline issue will cause weirdness with the local filenames you're saving to as well.)

        Inspect the actual URL you're requesting, right before requesting it, with a line like the following, and you should see the problem:

        print qq[ '$url' ];

        Aaron B.
        My Woefully Neglected Blog, where I occasionally mention Perl.

Re: Fetching an Image from HTTP
by Dog and Pony (Priest) on Jun 07, 2002 at 14:32 UTC
    > perl -MLWP::Simple -e "getstore '', 'image. +gif'"
    Check out LWP::Simple for this, and for more check out my own Getting more out of LWP::Simple (shameless plug).
    You have moved into a dark place.
    It is pitch black. You are likely to be eaten by a grue.
Re: Fetching an Image from HTTP
by silent11 (Vicar) on Jun 07, 2002 at 14:41 UTC
    This is very simple example, hopefully it is enough to get you going in the right direction.
    use LWP::Simple; my $fileIWantToDownload = ' +.gif'; my $fileIWantToSaveAs = 'monk_image.gif'; getstore($fileIWantToDownload, $fileIWantToSaveAs);
Re: Fetching an Image from HTTP
by cfreak (Chaplain) on Jun 07, 2002 at 19:23 UTC

    Several people I've noticed have mentioned LWP::Simple for getting the image which is perfect. When you get the image a great way to tell what its type is Image::Size. It will give you the type and diminsions of the Image. Its really easy to use and it supports a ton of image types.

    Hope that helps

    Some clever or funny quote here.
Re: Fetching an Image from HTTP
by Popcorn Dave (Abbot) on Jun 07, 2002 at 18:37 UTC
    LWP::Simple is definitely an easy way to go. I am using that in a project to grab news off foreign web sites that don't have RSS set up.

    As far as grabbing pictures, I have found that LWP::Simple will get them as it apparently ( and someone please correct me if I'm wrong here ) grabs the entire page, code and graphics.

    If you're doing this on a windows machine, those pictures you want may already be in your temporary directories and you might consider trying to get them out of there as another option. I don't know if *nix does the same thing or not but it may be another option to look at.

    Btw, how are you deciding which ones to get?

    You might also check out the latest issue of 2600 magazine. There was an article on grabbing pictures from websites that have the right click disabled - written in Perl.

    Hope that helps!

    Some people fall from grace. I prefer a running start...

      As far as grabbing pictures, I have found that LWP::Simple will get them as it apparently ( and someone please correct me if I'm wrong here ) grabs the entire page, code and graphics.
      I will correct you on this one. :)

      LWP::Simple does *not* in itself grab it all. If you point it at a HTML page, it will grab the HTML. If you point it directly at an image, it will grab the image. The HTML may contain image tags (something like: <img src="image.gif" />), and browsers will automatically follow the urls given there and fetch the images too. LWP::Simple will not - if that is the way you want it, you will need to parse the HTML for those URL:s and continue to fetch. Or you use an already invented wheel, like Image::Grab suggested by wil above. :)

      You have moved into a dark place.
      It is pitch black. You are likely to be eaten by a grue.
Re: Fetching an Image from HTTP
by Abigail-II (Bishop) on Jun 07, 2002 at 14:39 UTC
    {Deleted because people don't seem to like it}



      ...oh, don't do that Abigail, (delete I mean) because the presence of information (even if it's bad news) still contributes to the whole body of knowledge.

      The anti-motivational poster says, "It could be that the purpose of your life is merely to serve as a warning to others."

      I'm not actually suggesting that in this case, but it is instructive to see nodes that were unpopular. This serves as a marker at one end of the continuum just as a highly popular nodes mark the other end. And it might be that over time, others who drop by will like what you wrote and upvote it.

      My best guess is that we shouldn't worry too much about how our nodes get voted, we should just move on and write the next thing. It all averages itself out in the end. (I believe that, but people sometimes call me an optimist.)

      I should point out though, that there is the possibility that if you delete content from an unpopular node, it reduces the possibility that the node will continue to collect down votes. So factor that in to your thoughts as well.

      No matter what though, keep posting. As someone pointed out recently, "It's all good, except when it sucks."


        Ah, but oddly enough my note now not only lost its negative number of votes, but it also now has a twice as high positive score than it had negatively.

        I've always thought that the way people vote was uncomprehensible, but I hadn't expected this weirdness. I guess we either have a lot of zombie or bot voters (who don't read the post), or people just like to see {Deleted}.

        Abigail (I'm just pleasing the crowds! ;-))

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://172531]
Approved by samtregar
Front-paged by wil
[1nickt]: Corion I have a large site I need to check for broken links and absolute links. Making a scraper is easy of course; a spider that crawls a whole site is a little more involved ... I was planning a queue-based tool. Intersted to see what you do...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (8)
As of 2017-10-18 11:44 GMT
Find Nodes?
    Voting Booth?
    My fridge is mostly full of:

    Results (244 votes). Check out past polls.