http://www.perlmonks.org?node_id=691014


in reply to Downloading first X bytes of a file

Another way:

using IO::Socket::INET.
This has the advantage to be more flexible.

I don't know what you are looking for in the first 512 bytes, but dependent on what you get you could parse the response and fetch more data if needed.

#!/usr/bin/perl -w use strict; use IO::Socket::INET; my $sock = IO::Socket::INET->new(PeerAddr => 'www.google.de', PeerPort => 'http(80)', Proto => 'tcp'); die if ( !$sock ); print $sock "GET http://www.google.de/index.html\n"; my $len = 0; my $file = ''; while ( defined($sock) && ( my $c = $sock->getc()) && ($len<512)){ print $c,"\n\n"; $len+=length $c; $file .= $c; } print $file;

michael

Replies are listed 'Best First'.
Re^2: Downloading first X bytes of a file
by moritz (Cardinal) on Jun 09, 2008 at 12:09 UTC
    print $sock "GET http://www.google.de/index.html\n";

    Since you don't send a Hostname-header, you should send at least the HTTP/1.0 version string. And don't you need two newlines at the end, end perhaps a few carriage returns as well?

    Also note that there is more to getting web pages than sending one line of HTTP header. Your example won't follow redirects, for one thing, doesn't have error handling etc. There's a reason we use modules to abstract that stuff away.

    Besides, IMHO it's not very friendly to request a full page (no range header present) and then only read a part of the reply.

      Yes, you are right about the HTTP protocol.

      It's just a quick hack to show how it would work,
      after I tested the script successfully I didn't bother to look up the http references...
      Lazy as I am I debug my cgi scripts from time to time with telnet this way.

      Although I believe to remember if you don't send the HTTP Version the webserver has to assume you are a HTTP/1.0 client.

      There's however a reason I suggested to go this way,
      It's possible to parse the data you get and to request more if needed.

      It's also hard to say which way I would go without knowing about the purpose of reading the first 512 bytes of a page.

      Although I believe it's very often better to write your own module,
      you'll learn something and sometimes existing modules simply don't do what you expect due to their complexity.
      It's also easier to customize your own modules.

      I believe closing the socket SHOULDN'T have any effect on the remote server, since it's always possible the connection breaks.

      But I agree the script is not very friendly, I also would do some further work before using it..
Re^2: Downloading first X bytes of a file
by tachyon-II (Chaplain) on Jun 09, 2008 at 15:27 UTC

    All you need is the relative part of the path. You also generally need the HTTP protocol spec on the end or the server will complain.

    print $sock "GET / HTTP/1.0\n";
      Although I still didn't take a closer look into the http specs..

      Seems to be interesting:
      telnet www.google.de 80
      GET http://www.google.de/ HTTP/1.0[RET]
      Will return with headers.

      while
      GET http://www.google.de/ [RET]
      Doesn't show up any http headers,
      instead it just returns the file.

      I shouldn't hit stop while posting...