| [reply] |
I'm certainly not claiming that this is the best way to do it (and I'm not really claiming anything at all...even I don't use this anymore)...but I once put in place an example of how to deal with either XML or HTML (via HTML::TableExtract) that worked pretty well, for the task and the time: PerlMonks::StatsWhore.
As I said, that whole effort has fallen way by the wayside. I'm curious to see what you come up with in the sense of layering a common interface over the different methods of retrieval/parsing on the back end.
Cheers,
Matt | [reply] |
My apporach would be and has been to eliminate the HTML from the equation. If people need/want to write client code and we don't provide a sensible way for them to get it via XML I would rather they ask for a proper ticker or feed than scrape the web pages.
XML feeds are both lower load for the site and easier for people to utilize, and easier for us pmdevils to maintain. I am not even going to consider the possibility that something I do on site will break something that is parsing HTML (except for the CSS support I guess), but I will bend over backwards (and do backflips) to maintain backward compatibility for the XML tickers.
Anyway, one thing I regret is that the PM XML tickers aren't easier to work with as a collection. Each one alone is useful but together they are pretty awkward. Thus an on-going project/objective of mine on site has been to try to rationalize the tickers in the hope that writing client code for them is easier. My fear of breaking clients has lead me to be cautious however, and in the end I decided to create a new (currently) secret ticker in an attempt to resolve a lot of this in a single node. Maybe its time to publicize it...
---
$world=~s/war/peace/g
| [reply] |