http://www.perlmonks.org?node_id=947109


in reply to Re^3: Any spider framework?
in thread Any spider framework?

In the case of <a name="foo"> it simply won't match, as the regexp includes href.
And what makes you think the regex would limit itself to a single tag? In your example, the "<a" could be matched while the "href=" would be much further down in the document. In fact, there is no guarantee that that this string is a tag attribute, it could just be in plain html text ("PCDATA"), Javascript code, or even in HTML comments.

To be reliable, a parser (actually just a lexer; it could be regex based) should extract whole tags, and you should then test each on its own. That would be much more reliable.