Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^3: PerlMonks::Mechanized (beta)

by demerphq (Chancellor)
on Dec 30, 2004 at 16:34 UTC ( #418300=note: print w/ replies, xml ) Need Help??


in reply to Re^2: PerlMonks::Mechanized (beta)
in thread PerlMonks::Mechanized (beta)

Some quick thoughts:

  • The XML Node thread ticker should provide an option to provide a flat list directly.
  • Ill look into making a RAT ticker. I have a feeling expecting anyone else to do it is kinda like asking them to pull out their own toenails. :-)
  • Most of my efforts have been regarding things like search internal code, and facilitating such tasks as synchronizing the pmdev server with the master. Likewise with correctly using the private message ticker, et all.
  • Im not sure if webscraping and XML parsing code should live in the same place, im not saying it shouldnt Im just wondering why it should. :-)
  • Support for multiple users?
  • XP and level info can be obtained from the User XP XML ticker with the correct option specified.
  • Generally speaking I would prefer that if you have issues that you need to workaround with a given ticker that you let me know and we change the ticker.

This is stream of consciousness stuff here, so take it all with a grain of salt. Also I may update this as more stuff occurs to me.

---
demerphq


Comment on Re^3: PerlMonks::Mechanized (beta)
Re^4: PerlMonks::Mechanized (beta)
by davido (Archbishop) on Dec 31, 2004 at 04:33 UTC

    I'll try to address a few of your thoughts here.

    • The XML Node thread ticker should provide an option to provide a flat list directly.
      That sounds reasonable. I wrote my own routine to do it based on the threaded version, and this flattening is pretty inexpensive as compared to server hits, so to me it's not a huge issue, but in the name of completeness, yes, a "thread=threaded" or "thread=flat" option might be helpful. If such an option were there, I would have used it.
    • I'll look into making a RAT ticker.
      Thanks. The learning curve for my understanding your RAT code plus learning how to create an XML ticker would take me away from working on this module.
    • Most of my efforts... search internal code... synchronizing the pmdev server... correct use of private message ticker...
      If there's any feature my module should have, that would help you progress in these projects let me know.
    • I'm not sure if webscraping and XML parsing code should live in the same place...why...?
      Grabbing the site's XML tickers is trivial, and my module isn't really needed for that. My goal has been to take the site's XML tickers and use them to give the module's user useful datastructures. I guess I just wanted to give the module's users a solution that isolated them as much as possible from the details of the PM site implementation. Returning a datastructure instead of raw XML seemed like a good choice in that respect. But I'll listen to arguments to the contrary, and maybe add options to existing functions that would force a raw XML dump instead of the datastructure.
    • XP and level info can be obtained from the User XP XML ticker...
      I think I already used the right settings to grab that info. If not, you can always pass additional paramaters into the method that grabs the XML. I'm working on enabling pass-through to URL for such situations.
    • If you have issues that you need to workaround with a given ticker...
      I'll keep that in mind. For one thing, I would love it if the xml node thread ticker could be made to return titles too, perhaps with a "titles=1" option. That would make the Janitors Thread Retitler faster, since I wouldn't have to do a second hit to grab titles.

    Dave

Re^4: PerlMonks::Mechanized (beta)
by mojotoad (Monsignor) on Jan 10, 2006 at 08:04 UTC
    I'm certainly not claiming that this is the best way to do it (and I'm not really claiming anything at all...even I don't use this anymore)...but I once put in place an example of how to deal with either XML or HTML (via HTML::TableExtract) that worked pretty well, for the task and the time: PerlMonks::StatsWhore.

    As I said, that whole effort has fallen way by the wayside. I'm curious to see what you come up with in the sense of layering a common interface over the different methods of retrieval/parsing on the back end.

    Cheers,
    Matt

      My apporach would be and has been to eliminate the HTML from the equation. If people need/want to write client code and we don't provide a sensible way for them to get it via XML I would rather they ask for a proper ticker or feed than scrape the web pages.

      XML feeds are both lower load for the site and easier for people to utilize, and easier for us pmdevils to maintain. I am not even going to consider the possibility that something I do on site will break something that is parsing HTML (except for the CSS support I guess), but I will bend over backwards (and do backflips) to maintain backward compatibility for the XML tickers.

      Anyway, one thing I regret is that the PM XML tickers aren't easier to work with as a collection. Each one alone is useful but together they are pretty awkward. Thus an on-going project/objective of mine on site has been to try to rationalize the tickers in the hope that writing client code for them is easier. My fear of breaking clients has lead me to be cautious however, and in the end I decided to create a new (currently) secret ticker in an attempt to resolve a lot of this in a single node. Maybe its time to publicize it...

      ---
      $world=~s/war/peace/g

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://418300]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (8)
As of 2014-12-28 21:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (183 votes), past polls