Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^6: Split file, first 30 lines only (HTTP Ranges and :read_size_hint)

by Discipulus (Canon)
on Mar 02, 2017 at 10:28 UTC ( [id://1183386]=note: print w/replies, xml ) Need Help??


in reply to Re^5: Split file, first 30 lines only (HTTP Ranges)
in thread Split file, first 30 lines only

is not what :read_size_hint => $bytes of LWP::UserAgent is for?

or in other words: is :read_size_hint the implementation of the HTTP ranges you are talking about?

If i remember the hint word is there because there is no guarantee that the chunk retrieved will be exactly $bytes long: it is merely a hint, which LWP may disregard.

Even with such recomendation i remember i read somewhere, the following example seems to demonstrate that data is retrieved exactly by chunks of desired length, even for bizarre values of $bytes

Obviosly the last chunk will be of arbitrary lenght.

use strict; use warnings; use LWP::UserAgent; my @pages = ('http://www.perlmonks.org','http://perldoc.org'); my $ua = LWP::UserAgent->new; my $chunk; foreach my $url (@pages){ $chunk = 1; print +("=" x 70),"\n","\t\t$url\n",("=" x 70),"\n"; my $response = $ua->get($url, ':content_cb'=>\&head_only,':rea +d_size_hint' => $ARGV[0] || 1024); } sub head_only{ my ($data,$resp,$protocol) = @_; print "chunk number ", $chunk++,"\t",length $data," bytes received +\n"; } perl webinchunks04hint.pl 2048 ====================================================================== http://www.perlmonks.org ====================================================================== chunk number 1 2048 bytes received chunk number 2 2048 bytes received chunk number 3 2048 bytes received chunk number 4 2048 bytes received chunk number 5 2048 bytes received chunk number 6 2048 bytes received chunk number 7 2048 bytes received chunk number 8 2048 bytes received chunk number 9 2048 bytes received chunk number 10 2048 bytes received chunk number 11 2048 bytes received chunk number 12 2048 bytes received chunk number 13 2048 bytes received chunk number 14 2048 bytes received chunk number 15 2048 bytes received chunk number 16 2048 bytes received chunk number 17 2048 bytes received chunk number 18 2048 bytes received chunk number 19 2048 bytes received chunk number 20 2048 bytes received chunk number 21 2048 bytes received chunk number 22 2048 bytes received chunk number 23 2048 bytes received chunk number 24 2048 bytes received chunk number 25 2048 bytes received chunk number 26 2048 bytes received chunk number 27 2048 bytes received chunk number 28 2048 bytes received chunk number 29 2048 bytes received chunk number 30 2048 bytes received chunk number 31 2048 bytes received chunk number 32 2048 bytes received chunk number 33 2048 bytes received chunk number 34 2048 bytes received chunk number 35 2048 bytes received chunk number 36 2048 bytes received chunk number 37 1066 bytes received ====================================================================== http://perldoc.org ====================================================================== chunk number 1 2048 bytes received chunk number 2 2048 bytes received chunk number 3 2048 bytes received chunk number 4 2048 bytes received chunk number 5 2048 bytes received chunk number 6 2048 bytes received chunk number 7 2048 bytes received chunk number 8 2048 bytes received chunk number 9 361 bytes received

thanks

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

Replies are listed 'Best First'.
Re^7: Split file, first 30 lines only (HTTP Ranges and :read_size_hint)
by hippo (Bishop) on Aug 23, 2017 at 10:46 UTC
    is :read_size_hint the implementation of the HTTP ranges you are talking about?

    It is one implementation of it but requires careful use of the callback. As you can see from your code, it downloads all the content but in chunks of your specified size. Since the object here is rather only to download the minumum amount of data from the server, the callback must die to stop the subsequent chunks being retrieved. eg:

    #!/usr/bin/env perl use strict; use warnings; use utf8; use strict; use warnings; use LWP::UserAgent; # Modify these three variables only to suit my $url = 'http://www.gutenberg.org/ebooks/1533.txt.utf-8'; # M +acBeth my $wantlines = 30; # Retrieve this number of lines my $bytes = 256; # Chunk size to download my $firstndata; my $linecount = 0; my $chunkcount = 0; sub add_chunk { my ($chunk, $res, $proto) = @_; $firstndata .= $chunk; $linecount += () = $chunk =~ /\n/g; $chunkcount++; die if $linecount >= $wantlines; } my $ua = LWP::UserAgent->new; my $res = $ua->get ($url, ':content_cb' => \&add_chunk, ':read_size_hi +nt' => $bytes); print "Retrieved $linecount lines in $chunkcount chunks from $url:\n\n +$firstndata\n";

    If you run this, you will see that it retrieves slightly more than the 30 lines required, but substantially less than the full text. This seems like a reasonable compromise and is, of course, tunable by the user to the specific task at hand by varying $wantlines and $bytes.

Re^7: Split file, first 30 lines only (HTTP Ranges and :read_size_hint)
by wrkrbeee (Scribe) on Mar 02, 2017 at 16:07 UTC
    thank you Discipulus!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1183386]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-04-23 20:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found