Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

program "thinks" it is behind proxy

by comet85 (Novice)
on Apr 15, 2006 at 23:01 UTC ( #543594=perlquestion: print w/replies, xml ) Need Help??

comet85 has asked for the wisdom of the Perl Monks concerning the following question:

Hi I had made a program which would connect to a website, extracts all the titles present in the source of the page and puts it in a text file. But in my college I it wont connect to website since I was behind a proxy. So I used LWP::Simple module and made the environment variable as HTTP_proxy http://192.16.4:8080 That solved the problem in college. But when I got back home, where I dont have a proxy, I simply removed the environment variable and made the use LWP::Simple line in the code as comment. But the program still thinks that it is behind proxy and when I try to run the program it gives me this error. Error GETing http://www.rediff.com/rss/inrss.xml: Can't connect to 192.168.16.4: 8080 (connect: Unknown error) at C:\Perl\eg\test.pl line 13 Here is the code. Please help!!! Thanks
#!/usr/bin/perl # Include the WWW::Mechanize module use WWW::Mechanize; #use LWP::Simple; <--Notice this is a comment now # What URL shall we retrieve? $url = "http://www.rediff.com/rss/inrss.xml"; # Create a new instance of WWW::Mechanize my $mechanize = WWW::Mechanize->new(autocheck => 1); # Retrieve the page $mechanize->get($url); my $title =$mechanize->title; print "$title"; # Place all of the titles in an array my @title = $mechanize->title; open(FH, ">rediff.txt"); # Loop through and output each title foreach my $title (@title) { # Retrieve the link URL # my $href = $link->url; print FH $title; print FH "\n"; } close(FH);

Replies are listed 'Best First'.
Re: program "thinks" it is behind proxy
by sgifford (Prior) on Apr 16, 2006 at 01:20 UTC
    My guess is that the environment variable is still set somewhere. Try printing it out from within your program to verify that it's unset.
Re: program "thinks" it is behind proxy
by jasonk (Parson) on Apr 16, 2006 at 02:11 UTC

    If the $http_proxy environment variable is still set, then you are still using a proxy, commenting out the use LWP::Simple doesn't actually accomplish anything as you aren't using it anywhere.


    We're not surrounded, we're in a target-rich environment!
Re: program "thinks" it is behind proxy
by bowei_99 (Friar) on Apr 16, 2006 at 02:03 UTC
    Hm, I downloaded your code and ran it; it ran, just didn't get the titles, as I think it found multiple title tags and wasn't sure what to do. I changed url to http://www.burvil.org, and it gets the title there OK. Plus, if you print the results of $mechanize, you'll see there's no title field.

    As to your original question, as I wasn't able to recreate your problem, I can't help much. However, some changes below might help give you more verbose info, so you might know how to troubleshoot.

    • Replacing the top of your code with the following will give you more info:
      !/usr/bin/perl -w use strict; use warnings;

    • To print what's in $mechanize, include the following:
      use Data::Dumper;

      Then, after you've gotten the url, include the following:
      print Dumper($mechanize);

    When I did it for your original xml file, it didn't have a title field, so complained later that it was uninitialized (the latter you'd only get if you use strict;), but when I did it for http://www.burvil.org, it worked OK, and gave a title. That page is a normal HTML page, not xml.

    It strikes me as odd that it seems to have been working for you before. There is apparently something I'm missing, but not sure what it is.... In any case, hopefully this will point you in the right direction.

    Update: You might also try using XML::Parser to parse the xml itself. If you look at the Dumper output I mentioned earlier, it has the actual XML output; you can get your titles from that.

    -- Burvil

Re: program "thinks" it is behind proxy
by rjray (Chaplain) on Apr 16, 2006 at 06:37 UTC

    The LWP library automatically detects HTTP proxy information in environment variables. As others have said, you must still have that set, probably in your .profile or somewhere.

    --rjray

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://543594]
Approved by spiritway
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (2)
As of 2022-07-03 05:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?