http://www.perlmonks.org?node_id=418300


in reply to Re^2: PerlMonks::Mechanized (beta)
in thread PerlMonks::Mechanized (beta)

Some quick thoughts:

This is stream of consciousness stuff here, so take it all with a grain of salt. Also I may update this as more stuff occurs to me.

---
demerphq

Replies are listed 'Best First'.
Re^4: PerlMonks::Mechanized (beta)
by davido (Cardinal) on Dec 31, 2004 at 04:33 UTC

    I'll try to address a few of your thoughts here.

    • The XML Node thread ticker should provide an option to provide a flat list directly.
      That sounds reasonable. I wrote my own routine to do it based on the threaded version, and this flattening is pretty inexpensive as compared to server hits, so to me it's not a huge issue, but in the name of completeness, yes, a "thread=threaded" or "thread=flat" option might be helpful. If such an option were there, I would have used it.
    • I'll look into making a RAT ticker.
      Thanks. The learning curve for my understanding your RAT code plus learning how to create an XML ticker would take me away from working on this module.
    • Most of my efforts... search internal code... synchronizing the pmdev server... correct use of private message ticker...
      If there's any feature my module should have, that would help you progress in these projects let me know.
    • I'm not sure if webscraping and XML parsing code should live in the same place...why...?
      Grabbing the site's XML tickers is trivial, and my module isn't really needed for that. My goal has been to take the site's XML tickers and use them to give the module's user useful datastructures. I guess I just wanted to give the module's users a solution that isolated them as much as possible from the details of the PM site implementation. Returning a datastructure instead of raw XML seemed like a good choice in that respect. But I'll listen to arguments to the contrary, and maybe add options to existing functions that would force a raw XML dump instead of the datastructure.
    • XP and level info can be obtained from the User XP XML ticker...
      I think I already used the right settings to grab that info. If not, you can always pass additional paramaters into the method that grabs the XML. I'm working on enabling pass-through to URL for such situations.
    • If you have issues that you need to workaround with a given ticker...
      I'll keep that in mind. For one thing, I would love it if the xml node thread ticker could be made to return titles too, perhaps with a "titles=1" option. That would make the Janitors Thread Retitler faster, since I wouldn't have to do a second hit to grab titles.

    Dave

Re^4: PerlMonks::Mechanized (beta)
by mojotoad (Monsignor) on Jan 10, 2006 at 08:04 UTC
    I'm certainly not claiming that this is the best way to do it (and I'm not really claiming anything at all...even I don't use this anymore)...but I once put in place an example of how to deal with either XML or HTML (via HTML::TableExtract) that worked pretty well, for the task and the time: PerlMonks::StatsWhore.

    As I said, that whole effort has fallen way by the wayside. I'm curious to see what you come up with in the sense of layering a common interface over the different methods of retrieval/parsing on the back end.

    Cheers,
    Matt

      My apporach would be and has been to eliminate the HTML from the equation. If people need/want to write client code and we don't provide a sensible way for them to get it via XML I would rather they ask for a proper ticker or feed than scrape the web pages.

      XML feeds are both lower load for the site and easier for people to utilize, and easier for us pmdevils to maintain. I am not even going to consider the possibility that something I do on site will break something that is parsing HTML (except for the CSS support I guess), but I will bend over backwards (and do backflips) to maintain backward compatibility for the XML tickers.

      Anyway, one thing I regret is that the PM XML tickers aren't easier to work with as a collection. Each one alone is useful but together they are pretty awkward. Thus an on-going project/objective of mine on site has been to try to rationalize the tickers in the hope that writing client code for them is easier. My fear of breaking clients has lead me to be cautious however, and in the end I decided to create a new (currently) secret ticker in an attempt to resolve a lot of this in a single node. Maybe its time to publicize it...

      ---
      $world=~s/war/peace/g