Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

RE: Slashdot Headline Grabber for Win32

by marcos (Scribe)
on May 04, 2000 at 14:11 UTC ( #10205=note: print w/ replies, xml ) Need Help??


in reply to Slashdot Headline Grabber for Win32

I think the Slashdot Headline Grabber is a good idea. I only have one question: why not using LWP to get the /. xml page? I think it's better, and using LWP it is possible to set up a proxy (very useful if you are beyond a firewall). I hacked your code a bit, and here are my suggestions:

use LWP; use HTTP::Request::Common; # # .... # sub fetch_headlines { my @D; my $ua = new LWP::UserAgent; return 0 unless ($ua); $ua->proxy('http', 'http://myproxy.mynet.org:8080'); #set up your +proxy here my $url = "http://www.slashdot.org/slashdot.xml"; my $res = $ua->request(GET $url); if ($res->is_success) { @D = split /\n/, $res->content; } else { return 0; } my ($title, $url); for (@D) { $title = $1 if /\<title\>(.*)\<\/title\>/; $url = $1 if /\<url\>(.*)\<\/url\>/; if (/<\/story>/) { $stories{$url} = $title; push(@keys, $url); $title = ""; $url = ""; } } return 1; }

It may also be a good idea to use XML::DOM to parse the XML downloaded from /. but that is probably too much for the Headline Grabber: the for loop is quicker.
Any comment is highly appreciated.
marcos


Comment on RE: Slashdot Headline Grabber for Win32
Download Code
RE: RE: Slashdot Headline Grabber for Win32
by httptech (Chaplain) on May 04, 2000 at 16:24 UTC
    I actually started out using LWP; however I wrote this app to be compiled by perl2exe. The LWP version was 1.2 megs after compilation, and the Socket.pm version was just under a meg. I went with Socket.pm to save the 200K and also because it was a nice learning exercise for me in Socket.pm

    Good point about the proxy though; I hadn't thought about that.

      I never tried to use perl2exe: does it work well? I also have one question: where can I find Win32::GUI is there a ppm for Activestate perl?
      TIA
      marcos
        perl2exe works pretty good. It costs a little bit of money, though. I see a lot of people talking about perlcc from ActiveState; I don't know if it's cheaper.

        You can pick up a Win32::GUI PPM from http://jenda.mccann.cz

        I also host a mailing list for Win32::GUI at http://www.httptech.com/perl-win32-gui/

RE: RE: Slashdot Headline Grabber for Win32
by GridMonk (Acolyte) on May 09, 2000 at 08:09 UTC
    Well, my first post on perlmonks.org...
    ( and I FSK it up.... sheesh! )

    I am working from behind a corporate firewall, so I tried the LWP example above with my proxy info, and am getting:

    Protocol scheme: 'http://www.slashdot.org/slashdot.xml' is not supported.
    along with what looks like a 501 response in the debugger.

    I tried this with a couple of different URLs and got the same result, so I suspect it may be the proxy setup. Any ideas?

    Just to confirm:

    My proxy settings in Netscape show

    pset.tgw.canon.co.jp/proxy.pac

    so I used:
    #set up your proxy here
    $ua->proxy('http', 'http://pset.tgw.canon.co.jp:80');

    I tried both 80 and 8080 for ports, with the same result.

    Any advice appreciated.

      I think that the problem is that your company uses a proxy configuration script.
      If you check your Netscape proxy settings you should have the 'Automatic proxy configuration' option enabled and the 'Configuration Location (URL)' set to 'pset.tgw.canon.co.jp/proxy.pac'. If so my guess is correct: your are using an automatic proxy configuration script.
      You may download the proxy configuration script with a browser going to the URL http://pset.tgw.canon.co.jp/proxy.pac - or with a simple perl script that uses LWP and gets that URL :-).
      The script should not be too complicated to read: there should be a function that returns a proxy, more or less like this:
      return "PROXY 151.92.12.112:8080";
      the return value may be different if you are trying to get corporate intranet URLs or Internet URLs (for corporate intranet URLs you may have something like return "DIRECT").
      So all you have to do is find out in the proxy.pac the IP address (or the name), and the port of the real proxy your company is using to access the Internet, and then use this same proxy and port in the perl script.
      I hope this works. If you have problems, ask me, please.
      marcos
        Thanks.

        Netscape wouldn't give me the source right from the URL, so I went with using a quick LWP script like you suggested, and nabbed the file.

        Inserted address and port, and now it works nicely.

        I don't have a proxy at home, so last night I ripped it up to grab news headlines from slashdot, cnn, antionline, japantimes.co.jp, and yahoo news and spit them all out on a webpage. Not that hard, but the first time I ever got it to work right.

        Thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://10205]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (6)
As of 2014-10-25 19:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (148 votes), past polls