in reply to Re^2: Any spider framework?
in thread Any spider framework?
In the case of <a name="foo"> it simply won't match, as the regexp includes href. And you wouldn't want it to match, as it's not a link. Whitespace around the equals sign (which is rare, but valid) is more problematic. There are other edge cases which behave differently to how you might want them to as well - note that the first subcapture allows ">" to occur within it.
But in practise, it's probably good enough to work for the majority of people.
The author may well accept a patch to parse the page properly using HTML::Parser given that the module already has a dependency on that module (indirectly, via LWP::UserAgent).
Or if you can't wait for a new fixed version to be released, just subclass it - it's only really that one method that's in major need of fixing.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: Any spider framework?
by jdrago999 (Pilgrim) on Jan 08, 2012 at 04:54 UTC | |
Re^4: Any spider framework?
by bart (Canon) on Jan 10, 2012 at 08:07 UTC | |
Re^4: Any spider framework?
by jdrago999 (Pilgrim) on Jan 08, 2012 at 06:40 UTC |