|
|
| Perl: the Markov chain saw | |
| PerlMonks |
Re^3: Any spider framework?by tobyink (Prior) |
| on Jan 06, 2012 at 12:51 UTC ( #946593=note: print w/ replies, xml ) | Need Help?? |
|
In the case of <a name="foo"> it simply won't match, as the regexp includes href. And you wouldn't want it to match, as it's not a link. Whitespace around the equals sign (which is rare, but valid) is more problematic. There are other edge cases which behave differently to how you might want them to as well - note that the first subcapture allows ">" to occur within it. But in practise, it's probably good enough to work for the majority of people. The author may well accept a patch to parse the page properly using HTML::Parser given that the module already has a dependency on that module (indirectly, via LWP::UserAgent). Or if you can't wait for a new fixed version to be released, just subclass it - it's only really that one method that's in major need of fixing.
In Section
Seekers of Perl Wisdom
|
|
||||||||||||||||||||||||||