Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Searching directories for HTML title tags

by arashi (Priest)
on Jun 20, 2001 at 00:53 UTC ( #89812=note: print w/ replies, xml ) Need Help??


in reply to Searching directories for HTML title tags

I'd like to thank everyone who offered their input for my problem, I got everything working. And no, this wasn't for a class, it was a "busy-work" assignment for work that got handed down, I wanted to use PERL to both save time, and learn something new.

Here is my completed code:

use strict; use warnings; use diagnostics; use File::Find; use HTML::HeadParser; my $parser = new HTML::HeadParser; my @data; my $path = '/base/path'; &main; sub main { find(\&html_files, $path); open OUT, "+>filelist.html" || die "Can not write file"; print OUT '<html><head><title>File List</title></head><body><center>' +, "\n", '<table border="1" cellpadding="5" cellspacing="0">', "\n"; foreach my $file(sort @data) { my $htmlPage = &fileRead("<$file"); $parser->parse($htmlPage); my $pageTitle = $parser->header('Title'); if ($pageTitle eq "") { $pageTitle = '&nbsp;'; } print OUT '<tr><td>', "\L$file\E", '</td><td>', $pageTitle +, '</td></tr>', "\n"; } print OUT '</table></body></html>'; close OUT; } sub html_files { push @data, $File::Find::name if /\.s?html?$/; push @data, $File::Find::name if /\.s?HTML?$/; push @data, $File::Find::name if /\.s?htm?$/; push @data, $File::Find::name if /\.s?HTM?$/; } sub fileRead { my ($file) = @_; my $dataIn = undef; open IN, $file || die "Can not open $file"; while (<IN>) { my $temp = $_; $dataIn = $dataIn.$temp; } close IN; return $dataIn; }
Arashi

I'm sure Edison turned himself a lot of colors before he invented the lightbulb. - H.S.


Comment on Re: Searching directories for HTML title tags
Download Code
Re: Re: Searching directories for HTML title tags
by chromatic (Archbishop) on Jun 20, 2001 at 08:06 UTC
    A couple of quick notes. You can sharpen the regex in html_files():
    sub html_files { push @data, $File::Find::name if /\.s?html?$/i; }
    I'd pass in $File::Find::name as a parameter just to encapsulate things further. merlyn might point out that using $ as an anchor will break if there's a newline at the end of the filename, but that shouldn't be a problem. (\z is safer.) Finally, I don't know why you have m?, but using /i makes it the regex case-insensitive. Saves time.

    I'd also get rid of $temp in fileRead(). Always bugs me. :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://89812]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2015-07-06 02:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (69 votes), past polls