|
Xenofur has asked for the wisdom of the Perl Monks concerning the following question:
Hi monks,
After two days of frustrating experimenting, I'm hoping that one of you can help me with this. Ahead: I'm not looking for suggestions. Please only post if you have a solution you know works, as any suggestion you may give I've likely tried already.
As such, here's the question: How would i go about downloading multiple files in parallel - under Win32 AND Linux
- without having it crash due to too many leaked scalars
- without using a GB of ram
- without being incredibly slow when downloading
---
Solutions so far:
Combined from the input of ikegami and Corion, a solution that uses IPC::Open2 and an external wget executable. Runs very fast and does not require much RAM.
use IPC::Open2;
for my $id (@ids) {
$wgets++;
push @pids, open2(undef, undef, 'wget', $url.$id, '-q', '-O',
+$dir.$id);
while ( @pids >= 10 ) {
waitpid( shift @pids, 0 );
}
}
while ( @pids ) {
waitpid( shift @pids, 0 );
}
From BrowserUk a solution that uses threads and Thread::Queue, thus eleminating the need for an external executable. It does however use more RAM when running at speeds comparable to the previous solution.
sub fetch_xml_data {
my ($ids) = @_;
my $dir = 'quicklook/';
my $url = 'http://api.eve-central.com/api/quicklook?typeid=';
my $thread_count = 20;
my $Q = new Thread::Queue;
my @threads;
for my $id (@{ $ids }) {
$Q->enqueue( $id );
}
for ( 1 .. $thread_count ) {
push @threads, threads->create(
sub {
require LWP::Simple;
while( my $id = $Q->dequeue ) {
say "Downloading XML file for id $id.<br>";
LWP::Simple::getstore( $url.$id, $dir.$id );
}
}
);
$Q->enqueue( undef );
}
$_->join for @threads;
}
</readmore
Re: Parallel downloading under Win32?
by Corion (Patriarch) on Apr 29, 2009 at 11:58 UTC
|
I've written a downloader that spawned wget.exe as an external process to do the actual downloading. As long as no or little feedback is needed, that's all you need:
system(1, 'wget.exe', $url, '-O', $target_filename) == 0
or warn "Couldn't launch wget: $!/$?";
You only need a loop to launch the processes and possibly check from time to time how many instances you've launched. | [reply] [d/l] [select] |
|
|
I don't understand 'system' very well. The manual says that system makes the parent process wait for completion, which sounds to me like what backticks do. If that's the case, that's not exactly parallel.
If I'm misunderstanding that and it makes it run parallel, I fail to see how i can check for feedback, as perldoc doesn't mention any return of a handle or any sort of access to information about whether the task is still running. How would i know when it's done and what its success was?
| [reply] |
|
|
| [reply] |
|
|
|
|
| |
|
How would i know when it's done and what its success was?
Sounds like you want to launch these processes with Win32::Process's Create() function instead of system().
Cheers, Rob
| [reply] |
|
|
|
|
|
Re: Parallel downloading under Win32?
by BrowserUk (Patriarch) on Apr 29, 2009 at 19:36 UTC
|
- under Win32 AND Linux
- without having it crash due to too many leaked scalars
- without using a GB of ram
- without being incredibly slow when downloading
- Should run anywhere Perl+threads do.
(I don't have Linux!)
- No scalars leaked on my system.
- Uses 50MB for 4 concurrent threads.
- 16 files - 46,617,229 bytes - 181 seconds - 257 KB/s.
(The maximum throughput of my connection: 2496 kbps.)
#! perl -sw
use 5.010;
use strict;
use threads ( stack_size => 0 );;
use Thread::Queue;
sub thread {
my $tid = threads->tid;
require LWP::Simple;
my( $Q, $dir ) = @_;
while( my $url = $Q->dequeue ) {
my( $file ) = $url =~ m[/([^/]+)$];
my $status = LWP::Simple::getstore( $url, "$dir/$file" );
printf STDERR "[$tid] $url => $dir/$file: $status\n";
}
}
our $T ||= 4;
our $DIR ||= '.';
say scalar localtime;
my $Q = new Thread::Queue;
my @threads = map
threads->create( \&thread, $Q, $DIR ), 1 .. $T;
chomp, $Q->enqueue( $_ ) while <>;
$Q->enqueue( (undef) x $T );
$_->join for @threads;
say scalar localtime;
Console log from test session:
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
I would like to test it, but for that I'd need to insert it into my module. This endeavour in turn is hampered by the fact that i just plain cannot tell what's going on after: my $Q = new Thread::Queue;
Seriously, it looks like you wrote that with the intent to make it as unreadable as possible.
| [reply] [d/l] |
|
|
It's not actually hard. The system has threads that are fed off a Thread::Queue. Each thread takes a job from the queue, performs it, then takes the next one from the queue. The map just creates $T threads, and to tell each thread that it is finished, it sticks $T undef elements at the end of the queue. Then the main thread waits that all threads finish their work. That's all there is to it.
| [reply] [d/l] [select] |
|
|
|
|
Seriously, it looks like you wrote that with the intent to make it as unreadable as possible.
Is that a request for clarification?
Suggestion: Run it standalone as posted first, to convince yourself that it actually works on your system.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
|
|
|
Re: Parallel downloading under Win32?
by spx2 (Deacon) on Apr 29, 2009 at 13:51 UTC
|
If this would've been approved by TPF now you would've had a consistent source of information on the subject,however they decided not to approve the proposal.
If you want to have well written articles to learn from about the subject of "Parallel downloading under Win32" send an e-mail to TPF and tell them you want this(hopefully you won't be the only one asking for this) and if they approve the proposal I'll write the articles.
| [reply] |
|
|