comment on

I am actually a bit shocked that no-one mentioned using a negated character class to grab what you need. The idea is to grab everything that is not the character '<':

my ($title) = $chunk =~ /<title>([^<]+)/;
my @h1      = $chunk =~ /<h1>([^<]+)/g;
[download]

However, this is still not perfect. I personally think that nothing is too simple for a parser module, especially if that parser module is HTML::TokeParser::Simple:

use strict;
use warnings;

use Data::Dumper;
use HTML::TokeParser::Simple;

my $d = do {local $/;<DATA>};
my $p = HTML::TokeParser::Simple->new(\$d);
my %hash;

while ( my $token = $p->get_token ) {
   $hash{title} = $p->get_token->return_text
      if $token->is_start_tag('title');
   push @{$hash{h1}}, $p->get_token->return_text
      if $token->is_start_tag('h1');
}

print Dumper \%hash;

__DATA__
<html>
<head>
<title>foo</title>
</head>
<body>
<h1>one</h1>
<h1>two</h1>
<h1>three</h1>
</body>
</html>
[download]

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

In reply to (jeffa) Re: Is there a Limit on Matching .* by jeffa
in thread Is there a Limit on Matching .* by svsingh

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


go ahead... be a heretic
	PerlMonks