Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Downloading first X bytes of a file

by misc (Pilgrim)
on Jun 09, 2008 at 11:51 UTC ( #691014=note: print w/ replies, xml ) Need Help??


in reply to Downloading first X bytes of a file

Another way:

using IO::Socket::INET.
This has the advantage to be more flexible.

I don't know what you are looking for in the first 512 bytes, but dependent on what you get you could parse the response and fetch more data if needed.

#!/usr/bin/perl -w use strict; use IO::Socket::INET; my $sock = IO::Socket::INET->new(PeerAddr => 'www.google.de', PeerPort => 'http(80)', Proto => 'tcp'); die if ( !$sock ); print $sock "GET http://www.google.de/index.html\n"; my $len = 0; my $file = ''; while ( defined($sock) && ( my $c = $sock->getc()) && ($len<512)){ print $c,"\n\n"; $len+=length $c; $file .= $c; } print $file;

michael


Comment on Re: Downloading first X bytes of a file
Download Code
Re^2: Downloading first X bytes of a file
by moritz (Cardinal) on Jun 09, 2008 at 12:09 UTC
    print $sock "GET http://www.google.de/index.html\n";

    Since you don't send a Hostname-header, you should send at least the HTTP/1.0 version string. And don't you need two newlines at the end, end perhaps a few carriage returns as well?

    Also note that there is more to getting web pages than sending one line of HTTP header. Your example won't follow redirects, for one thing, doesn't have error handling etc. There's a reason we use modules to abstract that stuff away.

    Besides, IMHO it's not very friendly to request a full page (no range header present) and then only read a part of the reply.

      Yes, you are right about the HTTP protocol.

      It's just a quick hack to show how it would work,
      after I tested the script successfully I didn't bother to look up the http references...
      Lazy as I am I debug my cgi scripts from time to time with telnet this way.

      Although I believe to remember if you don't send the HTTP Version the webserver has to assume you are a HTTP/1.0 client.

      There's however a reason I suggested to go this way,
      It's possible to parse the data you get and to request more if needed.

      It's also hard to say which way I would go without knowing about the purpose of reading the first 512 bytes of a page.

      Although I believe it's very often better to write your own module,
      you'll learn something and sometimes existing modules simply don't do what you expect due to their complexity.
      It's also easier to customize your own modules.

      I believe closing the socket SHOULDN'T have any effect on the remote server, since it's always possible the connection breaks.

      But I agree the script is not very friendly, I also would do some further work before using it..
Re^2: Downloading first X bytes of a file
by tachyon-II (Hermit) on Jun 09, 2008 at 15:27 UTC

    All you need is the relative part of the path. You also generally need the HTTP protocol spec on the end or the server will complain.

    print $sock "GET / HTTP/1.0\n";
      I shouldn't hit stop while posting...
      Although I still didn't take a closer look into the http specs..

      Seems to be interesting:
      telnet www.google.de 80
      GET http://www.google.de/ HTTP/1.0[RET]
      Will return with headers.

      while
      GET http://www.google.de/ [RET]
      Doesn't show up any http headers,
      instead it just returns the file.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://691014]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2014-07-13 12:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (249 votes), past polls