Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

checking a url to make sure it works

by Anonymous Monk
on Feb 20, 2004 at 06:43 UTC ( [id://330478]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am using LWP to parse a website from a url collected via a form. I want to know if it's possible to check to see if the URL exists. I get weird situtations where sometimes if they use the space bar it will run as if they input a url, if they input http:// first (since I add it myself), it can't find the url. And if they add something like "324242342543" or "test.com" it tries to find it from my server and prints out "myserver.com". So I'm confused, I know defined doesn't check for url validity, is there something else I can try? And if I have to go with modules, I really need it to be one that's commonly installed rather than one I have to download.

I don't want to see if the url looks good, since I am parsing the webpage they provide for text, I need to say "Okay, the page is found..let's work" or "Nope, page can't be found so let's terminate".

my $geturl = "http://$url"; my $content = get("$geturl"); unless (defined $content) { print "<center>URL not found!</center><br><br>"; exit; }

Replies are listed 'Best First'.
Re: checking a url to make sure it works
by CountZero (Bishop) on Feb 20, 2004 at 07:31 UTC
    Unless you need to work with the content of the url, you could perhaps just use a HEAD-request and thus limit the amount of data which is transmitted.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: checking a url to make sure it works
by jweed (Chaplain) on Feb 20, 2004 at 07:00 UTC
    $content = get($url); $error = 1 unless (defined $content);
    should work from source code examinations, but if it doesn't this roundabout method ought to:
    use HTTP::Request; use LWP::UserAgent; my $ua = new LWP::UserAgent; my $request = HTTP::Request->new(GET => $url); my $response = $ua->request($request); $error = 1 unless ($response->is_success);



    Code is (almost) always untested.
    http://www.justicepoetic.net/
Re: checking a url to make sure it works
by benn (Vicar) on Feb 20, 2004 at 12:37 UTC
    You'll probably need to do a few "sanity checks" as well on the entered URL...as you've noticed, if the user inputs "http://www.test.com", then you'll end up trying to fetch "http://http://www.test.com" if you simply add an "http://" to the beginning without checking it first. A simple way round this would be to strip any "http://" part from the entered string with a regex, but for a more solid approach, check out some of the CPAN modules available (such as CGI::Untaint::url) for this task.

    Cheers,
    Ben.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://330478]
Approved by jweed
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-03-19 06:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found