Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Can You Explain How to Check a Link for Deadness

by Anonymous Monk
on May 12, 2002 at 17:34 UTC ( #166008=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm a newbie to Perl. I have a site with various external URLs in a flat-file database. If I have the URL in a variable in my program, what would I have to do to check the URL to see if it's dead or not? I read through other similar questions. I didn't get it. One, I'm not sure if I have LWP. Two, I just didn't get it. The best I found was this:
use strict; use IO::Socket::INET; for (@ARGV){ s|http://||; m|([^/]+)(.*)|; my $s=IO::Socket::INET->new(PeerAddr=>$1,PeerPort=>80,Proto=>'tcp',Ty +pe=>SOCK_STREAM); print $s "GET ".($2||'/')." HTTP/1.0\nHost: $1 \n\n"; print "Link $_ is validated\n" if <$s>=~/200 OK/; close $s; }
However, I'm clueless as to why it's a for. Also, it looks like it's reading from a default variable and I don't know how to set that. Could anyone explain either what this code is doing (line by line) or give me a better snippet of code? I don't need something that checks many links, just one.

Replies are listed 'Best First'.
Re: Can You Explain How to Check a Link for Deadness
by choocroot (Friar) on May 12, 2002 at 19:39 UTC
    # The script take the URL list on the command line # So, you should call your script like this : # perl myscript.pl http://site/f1.html http://site/f2.html ... # Command line arguments are stored in the ARGV list. # For each URL in ARGV, place this current URL in $_ # $_ the the "default" variable in Perl. for (@ARGV){ # Remove the leading "http://" part of $_ s|http://||; # Extract the server name in $1 and the file name in $2 # from $_ (see "perlre" documentation for this) m|([^/]+)(.*)|; # Open a tcp socket connection to the server $1 on port 80 my $s=IO::Socket::INET->new( PeerAddr=>$1, PeerPort=>80, Proto=>'tcp +', Type=>SOCK_STREAM ); # Send a simple HTTP GET request to the server for # file $2 or "/" if $2 is not defined. print $s "GET ".($2||'/')." HTTP/1.0\nHost: $1 \n\n"; # Read the first line of the answer (with <$s>) from the # server and print "Link xxx is validated" if the server # answered positively to the request (server answers # "HTTP 200 OK" when file is present) print "Link $_ is validated\n" if <$s>=~/200 OK/; # Close the connection close $s; # and treat the next URL }

    You can use the LWP package (launch perl -e 'use LWP' to check if LWP is installed).
    With LWP this could be rewritten like this:

    use strict; use LWP::UserAgent; my $ua = LWP::UserAgent->new; foreach my $url (@ARGV) { my $request = HTTP::Request->new( GET => "$url" ); my $response = $ua->request( $request ); if( $response->is_success ) { print "link $url is ok\n" } }
    LWP provide a higher abstraction, you don't need to handle the "low level" socket creation/communication.
    Read the documentation for LWP and HTTP::Request for futher details.

    Good luck :)

      A pretty thorough explanation, ++. A tip, perl -MLWP is shorter and will do the same thing.

      --
      perl -pew "s/\b;([mnst])/'$1/g"

Re: Can You Explain How to Check a Link for Deadness
by DigitalKitty (Parson) on May 12, 2002 at 20:19 UTC
    I don't need something that checks many links, just one.

    Hi.

    One 'quick and dirty' solution is to use the LWP::Simple module and check the return value of the url that was entered.

    #!/usr/bin/perl -w use strict; use LWP::Simple; my $url; my $site; print "URL to check: "; chomp($url = <STDIN>); $site = get($url); if($site) { print "$url is good.\n"; } else { print "$url appears to be broken.\n"; } Sample run with output: C:\perl>perl linkcheck.pl URL to check: http://www.perlmonks.org http://www.perlmonks.org is good. C:\perl>perl linkcheck.pl URL to check: http://www.google.com http://www.google.com is good. C:\perl>perl linkcheck.pl URL to check: http://www.blahblahblah.com http://www.blahblahblah.com appears to be broken. C:\perl>


    Hope this helps,

    -DigitalKitty
      Substituting the call to the get function in the LWP::Simple for a call to head you could save a good amount of time as the head function checks only for the presence of the page instead of downloading it all.


      $|=$_="1g2i1u1l2i4e2n0k",map{print"\7",chop;select$,,$,,$,,$_/7}m{..}g

Re: Can You Explain How to Check a Link for Deadness
by tachyon (Chancellor) on May 13, 2002 at 01:18 UTC
    use LWP::Simple; my $page = 'http://www.perlmonks.org/'; $headers = head($page); print $headers->{'_msg'}," ", $headers->{'_rc'}, "\n\n"; # have a look at the info we get back for interest sake use Data::Dumper; print Dumper $headers; __DATA__ OK 200

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Can You Explain How to Check a Link for Deadness
by alien_life_form (Pilgrim) on May 12, 2002 at 20:51 UTC
    Greetings,

    In addition to all the good things that have been said about LWP, there are things that LWP gets right that the sample code does not: authentication and proxies, for instance (though I am not sure that LWP::Simple foots the bill completely)

    As for checking wether you have LWP:

    perl -MLWP -e 'print "Hello\n"'

    Cheers,
    alf
    You can't have everything: where would you put it?
Re: Can You Explain How to Check a Link for Deadness
by CharlesClarkson (Curate) on May 13, 2002 at 03:40 UTC

    Don't throw away a link that fails. Stick it in another file to be checked again later. Servers go down and sometimes dead links aren't dead.


    HTH,
    Charles K. Clarkson
    Clarkson Energy Homes, Inc.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://166008]
Approved by DaWolf
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2020-11-23 19:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?