svsingh has asked for the wisdom of the Perl Monks concerning the following question:
I'm trying to pull the title and h1 out of an HTML file (local). I figured this is a little too simple to use any of the HTML parsing modules and I'm using a simple match. The HTML file is guaranteed to have only one h1.
Here's what I'd like to do ...
$/ = '</h1>'; my $chunk = <HTMFILE>; $chunk =~ m%<title>(.+)</title>.*<h1>(.+)</h1>%i;
... which returns a pair of undefs. If I split the match over a couple of lines, however, everything works out just fine. Here's what's working:
$/ = '</h1>'; my $chunk = <HTMFILE>; $chunk =~ m%<title>(.+)</title>%i; my $title = $1; $chunk =~ m%<h1>(.+)</h1>%i; my $heading = $1;
The best explanation I can think of is .* only matches up to a certain number of characters. My test file has 3750 characters between </title> and <h1>. Is that what's happening here?
Thanks for your help.
Back to
Seekers of Perl Wisdom