Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Web Site Mapping Tool

by ajt (Prior)
on Oct 22, 2001 at 14:48 UTC ( [id://120494]=perlquestion: print w/replies, xml ) Need Help??

ajt has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have too look after some web sites, many of which are in languages that I don't speak (German, French and so on). I would like to build a site map for each site, listing all the HTML files in a recursive directory manner.

Cut and paste is error prone and slow, plus the sites won't be in English which will make my life a lot harder.

I did a quick search on Google, and pulled up a lot of ancient Perl4 like scripts. I tried CPAN for modules, and found two vague possibles, and found nothing on PM.

Seeing is Perl is my language of choice I'll write my own, unless some one happens to know of a recent one that works okay.

What I need has the following loose requirments:

  • Runs as CGI live and/or off-line storing output in a static file. I'm using wget to publish the site, so it can be slow
  • Produces modern well-formed xHTML fragments- most of the ones I found on Google don't
  • As generic as possible, has as little HTML and "site" encoded in it as possible.

Does anyone have something mostly written? If so may I use it? I'd obviously prefer Perl for this job, but I'd accept PHP if someone happend to have a good mapper.

If not then I'll write my own. I think I'll do the following:

  • CGI for the CGI and HTML
  • HTML::TreeBuilder to get title tags and META Descriptions - other options?
  • File::Find to work over the directory tree
  • I will pass the results through the sites XSLT-like templating system, so the script will only produce the barest bones of HTML

I'll write it for simplicity, simply traverse the directories one by one, printing a simple indented HTML structure as I go.

I think this should be easy to build, and simple, I only have a few dozen directories, and a few hundred files at the moment. If someone has done this already, I would be very happy to not re-invent the wheel.

As ever thanks in advance...

Replies are listed 'Best First'.
Re: Web Site Mapping Tool
by larsen (Parson) on Oct 22, 2001 at 15:05 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://120494]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (3)
As of 2024-04-19 19:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found