Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Using URI::URL

by mkurtis (Scribe)
on Feb 21, 2004 at 03:05 UTC ( #330743=perlquestion: print w/replies, xml ) Need Help??

mkurtis has asked for the wisdom of the Perl Monks concerning the following question:

Anyone have experience using URI::URL. Yes I have been to CPAN and looked at the documentation, but it doesn't make any sense. I am trying to use it for this. Thanks

Replies are listed 'Best First'.
•Re: Using URI::URL
by merlyn (Sage) on Feb 21, 2004 at 12:13 UTC
Re: Using URI::URL
by matija (Priest) on Feb 21, 2004 at 10:27 UTC
    One obvious problem with your code is that you open LINKS for appending, and then you try to read from it.

    As far as I can see from a simple test, perl -e 'open(BLA,">>/tmp/bla"); while(<BLA>){print}'

    The test shows that won't give you an error messages, but it will exit the loop immediately.

    Which is why your webcrawler exits prematurely, without even reaching the calls to links. You should read from a different file than you are writing to.

    (And you should step through it in debugger to see if what you think is happening is actualy happening.)

Re: Using URI::URL
by jeffa (Bishop) on Feb 21, 2004 at 16:01 UTC
    Wrong question. You are trying to pull links from a webpage. You most likely instead should be using WWW:Mechanize:
    use strict; use warnings; use Data::Dumper; use WWW::Mechanize; my $a = WWW::Mechanize->new(); $a->get( 'http://some.site.com.' ); print $_->url,"\n" for @{ $a->links };

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      does www::mechanize follow the rules set by lwp::robotua? I know that before ifi tried to "get" something that a robots.txt file didnt allow me too, my get came up empty (a good thing). Does www:mechanize allow me to do the same thing? I have been to CPAN and looked at it, but didnt see anything about obeying rules. Thanks
        Kudos to you for wanting polite bots. The problem with getting LWP::RobotUA to play nice with WWW::Mechanize is that they both are subclasses of LWP::UserAgent. By itself, WWW::Mechanize does not consult the /robots.txt file, but you can instead use WWW::RobotRules. Here is a working example that tries to grab two files from my server: There might be a better way though ... ahh, how about "WWW::Mechanize::Polite"? And if i didn't just reinvent a wheel, you might be seeing this on the CPAN. ;)

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
Re: Using URI::URL
by Anonymous Monk on Feb 21, 2004 at 03:27 UTC
    I have never used URI::URL but from the code you linked to, it appears that you would do something like this:
    $url = new URI::URL $links, $content->base(); print LINKS $url->as_string(), "\n";
    I think that should work, but it hasn't been tested. You may be able to use print LINKS "$url\n"; but I can't tell from the documentation if it is the same thing.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://330743]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2020-06-04 03:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (30 votes). Check out past polls.

    Notices?