Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

check multiple URLS in the same time

by sabri (Initiate)
on Aug 19, 2012 at 00:53 UTC ( #988266=perlquestion: print w/ replies, xml ) Need Help??
sabri has asked for the wisdom of the Perl Monks concerning the following question:

hello plz help !!! this code is for check url from file "list.txt" but it check theme one by one and I want to make it check 40 or 100 or a number that I chose in the same time sorry 4 my bad english :) thank you :d
#!/usr/bin/perl require LWP::UserAgent; require HTTP::Request; print "Content-type: text/html\n\n"; # url a vérifier open FILE, "list.txt" or die $!; my @lines = <FILE>; for $url(@lines) { $etat = &check_url ($url); if ($etat eq "okkkkkkkkkkkkk") { print "$url:$etat \n"; } else { print "$url KO : $etat\n"; } sub check_url { my ($url); my ($ua); $url = $_[0]; # crée un user agent $ua = new LWP::UserAgent; # défini la signature du browser $ua->agent("LinkChecker ($url)"); $ua->timeout(10); $ua->max_size(300); # prends 300 bytes # effectue la requete $request = new HTTP::Request GET => $url; $response = $ua->request($request); if ($response->is_success) { return "okkkkkkkkkkkkk"; } else { return $response->code; } } my $outfile = "output.txt"; open (OUTFILE, ">> $outfile"); print OUTFILE "$url = $etat\n"; }

Comment on check multiple URLS in the same time
Download Code
Re: check multiple URLS in the same time
by Anonymous Monk on Aug 19, 2012 at 01:20 UTC
Re: check multiple URLS in the same time
by BrowserUk (Pope) on Aug 19, 2012 at 01:59 UTC

    See LWP::Parallel::UserAgent


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: check multiple URLS in the same time
by zentara (Archbishop) on Aug 19, 2012 at 09:36 UTC
    Hi, here is an untested piece of code( not written by me ) to show you how to use Parallel::ForkManager to run things concurrently. You could also use threads, but forking is simpler. You could also google for "Parallel ForkManager LWP" and get links like parallel web get with lwp::simple
    #!/usr/bin/perl use Parallel::ForkManager; use LWP::Simple; use LWP::UserAgent ; use HTTP::Status ; use HTTP::Request ; %urls = ('drudge'=> 'http://www.drudgereport.com', 'rush' =>'http://www.rushlimbaugh.com/home/today.guest.html', 'yahoo' => 'http://www.yahoo.com', 'cds' => 'http://www.cdsllc.com/',); foreach $myURL (sort(values(%urls))){ $count++; print "Count is $count\n"; $document = DOCUMENT_RETRIEVER($myURL); } sub DOCUMENT_RETRIEVER{ $myURL=$_[0]; $mit = $myURL; print "Commencing DOCUMENT_RETRIEVER number $iteration for $mit\n"; print "Iteration is $iteration and Count is $count\n"; for ($iteration = $count; $iteration <= $count;$iteration++){ $name = $iteration; print "NAME $name\n" ; my $pm=new Parallel::ForkManager(30); $pm->start and next; print "Starting Child Process $iteration for $mit\n" ; $ua = LWP::UserAgent->new; $ua->agent("$0/0.1 " . $ua->agent); $req = new HTTP::Request 'GET' => "$mit"; $res = $ua->request($req, "$name.html"); print "Process $iteration Complete\n" ; $pm->finish; $pm->wait_all_childs; print "Waiting on children\n"; } undef $name; }

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh

      You could also use threads, but forking is simpler.

      Absolutely, its why use Parallel::ForkManager instead of forks :)

Re: check multiple URLS in the same time
by thonar (Monk) on Aug 23, 2012 at 11:10 UTC
    I picked the idea of zentara using Parallel::ForkManager and LWP::UserAgent, here ist what works for me:
    #!/usr/bin/perl -w use strict; use warnings; use Parallel::ForkManager; use LWP::UserAgent ; my $refUrlList = []; my $maxChild = "3"; my $maxRedir = "5"; my $maxSize = "300"; my $timeout = "5"; (my $logFile = $0) =~ s/\.(\w+)$/\.log/; for ( @ARGV){ if ( -f ){ open ( IN, "<", $_) or die "Cant open $_: $!\n"; while (<IN>){ push(@$refUrlList, $_); } close ( IN) or die "Close Failed: $_: $!\n"; } } + + &multiUrl( $refUrlList); sub multiUrl { my $refUrls = shift; open ( OUT, ">> $logFile") or die "Can't open $logFile: $!\n"; my $pm = new Parallel::ForkManager( $maxChild); for ( @$refUrls) { chomp( my $url = $_); $pm->start and next; my $ua = LWP::UserAgent->new; $ua->agent( $0); $ua->max_redirect( $maxRedir); $ua->max_size($maxSize); $ua->timeout( $timeout); my $res = $ua->get( $url); print OUT "$url | " . $res->code . " | " . $res->message ."\n"; print "$url | " . $res->code . " | " . $res->message ."\n"; $pm->finish; } $pm->wait_all_childs; close ( OUT) or die "Close Failed: $logFile: $!\n"; }
Re: check multiple URLS in the same time
by Corion (Pope) on Aug 23, 2012 at 14:48 UTC

    Here is an example using AnyEvent:

    #!/usr/bin/perl use strict; use AnyEvent; use AnyEvent::HTTP; my %urls = ('drudge'=> 'http://www.drudgereport.com', 'rush' =>'http://www.rushlimbaugh.com/home/today.guest.html', 'yahoo' => 'http://www.yahoo.com', 'cds' => 'http://www.cdsllc.com/',); my $done = AnyEvent->condvar; my @requests; my $count; for my $myURL (sort(values(%urls))){ $done->begin(); $count++; print "Count is $count\n"; push @requests, http_get $myURL => sub { print "Retrieved '$myURL' $_[1]->{Status}\n"; # $_[0] is data # $_[1] is headers $done->end; }; } print "Waiting on requests\n"; $done->recv;
Re: check multiple URLS in the same time
by philiprbrenan (Monk) on Sep 01, 2012 at 16:39 UTC

    Threads work well (for this request) on Windows:

    use feature ":5.14"; use warnings FATAL => qw(all); use strict; use threads; use LWP::Simple; my @u = qw(drudgereport.com rushlimbaugh.com/home/today.guest.html yah +oo.com appaapps.com cdsllc.com gdfgfasgdfgfs.jkl); sub u($) {get('http://www.'.$_[0])} # Web request my %t; $t{threads->create('u', $_)->tid}{url} = $_ for @u; # Start th +reads for(1..10) # Retrieve completed requests with time limits {$t{$_->tid}{data} = $_->join for threads->list(threads::joinable); last unless scalar threads->list(threads::running); sleep(1); } say $t{$_}{url}, "=", ($t{$_}{data} ? 1 : 0) for sort {$t{$a}{url} cmp + $t{$b}{url}} keys %t;

    Produces

    appaapps.com=1 cdsllc.com=1 drudgereport.com=1 gdfgfasgdfgfs.jkl=0 rushlimbaugh.com/home/today.guest.html=1 yahoo.com=1

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://988266]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (13)
As of 2014-12-26 06:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (165 votes), past polls