Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

WWW Keywords

by cajun (Chaplain)
on Jun 10, 2005 at 01:40 UTC ( #465382=perlquestion: print w/replies, xml ) Need Help??
cajun has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to grab the 'keywords' from the header of a web page. I found in Perl & LWP, where I thought '$response->header('keywords')' would grab this for me. But as of yet it has not worked, nor have I been able to figure out why it isn't working.

I've looked at the docs for LWP, LWP::Simple, LWP::UserAgent, lwpcook, just to name a few.

Thanks for any suggestions.

#!/usr/bin/perl -w use strict; use LWP; use LWP::UserAgent; my $browser = LWP::UserAgent->new(); $browser->env_proxy(); # if we're behind a firewall my $url = ''; my $response = $browser->get($url); die "Error \"", $response->status_line(), "\" when getting $url" unless $response->is_success(); my $keywords = $response->header('keywords'); print $keywords;

Replies are listed 'Best First'.
Re: WWW Keywords
by atcroft (Monsignor) on Jun 10, 2005 at 02:03 UTC

    It sounds as if you are confusing what is meant by a "header". In your context, you sound as if you mean that between the <head> </head> tags; the meaning of "header" with regards to LWP is the HTTP message headers, which are interpretted by the browser before the page content is processed. Try looking at one of the methods for processing a standard webpage, paying attention to the content within the HEAD tags.

    Hope that helps.

Re: WWW Keywords
by kaif (Friar) on Jun 10, 2005 at 04:46 UTC

    Here's a solution using HTML::HeadParser:

    # Just to get the content use LWP::Simple; $html = get(""); # To parse the HTML header use HTML::HeadParser; $p = HTML::HeadParser->new; $p->parse($html); $keywords = $p->header( 'X-Meta-Keywords' ); print "$keywords\n"; __END__ perl, mod_perl, regular expressions, regexp, xp whoring, CGI, programming, learning, tutorials, questions, answers, examples, vroom, tim, node, experience, votes, code

    Interesting keywords there ...

Re: WWW Keywords
by davidrw (Prior) on Jun 10, 2005 at 02:08 UTC
    My first thought was for you to print $response->as_string; to see exactly what the HTTP headers look like.. But after reading the first response, you're probably after the HTML headers (meta tags, etc inside <head></head> tags) ... In that case, take a look at HTML::Parser, specifically in the EXAMPLES section where it extracts the <title> text.
Re: WWW Keywords
by cajun (Chaplain) on Jun 10, 2005 at 03:19 UTC
    Thanks atcroft and davidrw. You are both correct. I am looking for the information between <head> and </head>. Yes, I was confused on the 'Headers' vs 'HEAD' issue.

    I'm looking at the docs for HTML::Parser and HTTP::Headers, which seems to be related.

    Thanks for pointing me in the correct direction.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://465382]
Approved by davidrw
[ambrus]: you just need an io watcher, created by &AnyEvent::Impl:: Whatever::io(...)
[Corion]: So after talking it through with you even while I'm still not entirely clear on where AE ends and my implementation begins, I think I understand that I only need to implement some smaller parts for each functionality I want to support.
[Corion]: Yeah... and you might even be able to mix and match additional functionality if you have additional async suppliers, like from a separate thread
[ambrus]: You hvae to be careful with the timer, because apparently Prima::Timer insists on being periodic, wheras AnyEvent::Impl:: Whatever::timer should give a one-shot timer watcher
[ambrus]: I think the minimal implementation here is just a timer and io function, plus pushing to the @REGISTRY.

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2016-12-08 12:26 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (141 votes). Check out past polls.