Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

can't get www::mechanize to work on a web site

by smackdab (Pilgrim)
on Mar 01, 2004 at 05:17 UTC ( #332780=perlquestion: print w/ replies, xml ) Need Help??
smackdab has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to get the lyrics for a song...w/o success
use WWW::Mechanize; use URI::URL; use strict; use warnings; my $artist = 'The Beatles'; my $title = 'Hey Jude'; my $mech = WWW::Mechanize->new(); my $search = join("+", split(/ /, $artist)) . "+" . join("+", split(/ +/, $title)); print "search=$search\n"; $mech->get("http://search.lyrics.astraweb.com/?word=$search"); $mech->success() or die "Can't get the search page\n"; #print $mech->content(); #$mech->follow_link(url_regex=>qr/$title/i); $mech->follow_link(url_regex=>qr/hey jude/i); #$mech->follow_link(text=>"Hey Jude"); $mech->success() or die "Can't find song page\n"; print $mech->content();
I get the search results, but can't figure out how to get the "song link"...any help is appreciated!!!

-- update: changed title

Comment on can't get www::mechanize to work on a web site
Download Code
Re: suck on www::mechanize question
by Roger (Parson) on Mar 01, 2004 at 05:35 UTC
    When I click on the title of the song in the search results page, I got the following error message:

    Error InterScan HTTP Version 3.8-Build_1080 $Date: 01/31/2003 16:12:0037$ Connecting to display.lyrics.astraweb.com: Connection refused


    May be that explains why your robot can't follow the link?

      strange, this part works manually for me: (overall it still doesn't grab the lyrics though)...

      http://display.lyrics.astraweb.com:2000/display.cgi?beatles..beatles_1..hey_jude

      So you got the search results to be correct then?
        My company has a firewall running, may be that has something to do with forbidden access. I will try again when I go home later on my own ISP and see if I get the same error. I am guessing that you might be looking for some regex to extract the song lyrics? ... If so, could you post some HTML on your notepad and state which part you want to get extracted?

Re: suck on www::mechanize question
by leira (Monk) on Mar 01, 2004 at 06:24 UTC
    It doesn't look like you were far off in your script, and I think the "Connection refused" error that Roger mentioned might be your problem. You should make sure that doing those actions in a normal browser does what you expect it to do, before you start blaming your script for unexpected results.

    You could try using HTTP::Recorder to record your WWW::Mechanize script. I tried it, and it generated this script:

    $mech->get("http://search.lyrics.astraweb.com/?word=hey+jude"); $mech->follow_link(text => "Hey Jude (lennon/mccartney)", n => "1");

    Since the "Hey Jude" link (without authors) produced a 500 (connection refused) error, so I chose another one for the example.

    Linda

      Thanks for trying it!
      The connection refused error might be the problem, but I can get it to work in the browser, not www::mechanize. I'll look into HTTP::Recorder and see what it does...

      Did you get it to work??????

      I tried switching the artist/song to others w/o change. I also tried the ->follow_link having the artist and song as part of the search string, but no difference.

      The original $mech->get() has artist and song, if you know of any artist/song that works, maybe that would give me a hint...(or maybe not...)
        OK, further investigation suggests that it's not one bad link, but that the server just sometimes returns a 500 error. Other times it succeeds. If I run my script several times, it will sometimes succeed and sometimes fail -- and I get the same results if I try to follow the link several times in my browser.

        I was able to get around it like this:

        my $maxtries = 10; my $i = 1; while ($i <= $maxtries) { $mech->follow_link(text => "Hey Jude (lennon/mccartney)", n => "1" +); last if $mech->success; $mech->back(); $i++; } $mech->success() or die "Can't find song page\n";

        Of course, you can set $maxtries to whatever you think is prudent, and you can put in a sleep() in the loop if you think that might help.

        Linda

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://332780]
Approved by Roger
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2014-09-20 11:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (158 votes), past polls