Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Getting Web site header info

by Anonymous Monk
on Sep 17, 2002 at 14:40 UTC ( [id://198500]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I can only get the title from this script that I am trying to fetch web header info. Please advise why I can not get any of the other information. It just prints "Mysite Title" but not any other info that is in my website. Here is an example of metatags on my page:
<HTML> <head> <TITLE>Mysite Title</TITLE> <META NAME="resource-type" CONTENT="document"> <META NAME="description" CONTENT="Homepage"> <META NAME="keywords" CONTENT="word1,word2, word3"> <META NAME="distribution" CONTENT="all"> <META NAME="poc" CONTENT="Mike Smith 111-111-1111"> <META NAME="postdate" CONTENT="20011220 "> <META NAME="title" CONTENT="Department C "> <META NAME="url" CONTENT="http://www.mywebsitehere.com"> </head>
Here is script:
use LWP::Simple; use HTTP::Headers; use HTML::HeadParser; $h = HTTP::Headers->new; $p = HTML::HeadParser->new($h); $url = 'http://www.thisismysite.com'; $content = get($url); $p->parse($content); print $h->header('Title')."\n"; print $h->header('Content-Base')."\n"; #This does not print print $h->header('Last-Modified')."\n"; #This does not print print $h->header('Content-Length')."\n"; #This does not print print $h->header('Meta name')."\n"; #This does not print

Replies are listed 'Best First'.
Re: Getting Web site header info
by gav^ (Curate) on Sep 17, 2002 at 15:35 UTC
    Using Data::Dump to dump all the headers in your example you can see what keys to look for:
    use HTML::HeadParser; use HTTP::Headers; use Data::Dump 'dump'; my $h = HTTP::Headers->new; my $p = HTML::HeadParser->new($h); $p->parse(<<EOT); # your header here! EOT print dump($p->header), "\n"; __END__ bless({ title => "Mysite Title", "x-meta-description" => "Homepage", "x-meta-distribution" => "all", "x-meta-keywords" => "word1,word2, word3", "x-meta-poc" => "Mike Smith 111-111-1111", "x-meta-postdate" => "20011220 ", "x-meta-resource-type" => "document", "x-meta-title" => "Department C ", "x-meta-url" => "http://www.mywebsitehere.com", }, "HTTP::Headers")
    All the keys start with 'x-meta' which is why nothing was printing for you. Checking the documentation for HTML::HeadParser I found:
    X-Meta-Foo:
    All <meta> elements will initialize headers with the prefix ``X-Meta-'' on the name. If the <meta> element contains a http-equiv attribute, then it will be honored as the header name.
    You can find the docs by running perldoc HTML::HeadParser

    Hope this helps...

    gav^

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://198500]
Approved by virtualsue
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-19 17:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found