Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re^2: PerlMonks::Mechanized (beta)

by davido (Archbishop)
on Dec 30, 2004 at 16:25 UTC ( #418295=note: print w/ replies, xml ) Need Help??


in reply to Re: PerlMonks::Mechanized (beta)
in thread PerlMonks::Mechanized (beta)

Putting it on Sourceforge might not be a bad idea. I'll look into that in the next day or two.

A few thoughts for things I plan to add or change in this module:

  • Add a Recently Active Threads parser. This will be easier once a RAT XML ticker is available. Unless someone else gets to that first, I'll attempt it eventually.
  • Add a normal displaytype (non-XML) node grabber, possibly combined with an HTML token parser.
  • Add interoperability with some of the Monastery's nodelets.
  • Fix the Janitor class so that it is smart enough to know the difference between different types of nodes. For example, currently if you edit the doctext field of a Code Catacombs type node, you're actually editing the code display segment, not the description segment.
  • This would be a pretty major change, but I may make PerlMonks::Mechanized a proper subclass of WWW::Mechanize, instead of its current 'uses' relationship.

Dave


Comment on Re^2: PerlMonks::Mechanized (beta)
Re^3: PerlMonks::Mechanized (beta)
by demerphq (Chancellor) on Dec 30, 2004 at 16:34 UTC

    Some quick thoughts:

    • The XML Node thread ticker should provide an option to provide a flat list directly.
    • Ill look into making a RAT ticker. I have a feeling expecting anyone else to do it is kinda like asking them to pull out their own toenails. :-)
    • Most of my efforts have been regarding things like search internal code, and facilitating such tasks as synchronizing the pmdev server with the master. Likewise with correctly using the private message ticker, et all.
    • Im not sure if webscraping and XML parsing code should live in the same place, im not saying it shouldnt Im just wondering why it should. :-)
    • Support for multiple users?
    • XP and level info can be obtained from the User XP XML ticker with the correct option specified.
    • Generally speaking I would prefer that if you have issues that you need to workaround with a given ticker that you let me know and we change the ticker.

    This is stream of consciousness stuff here, so take it all with a grain of salt. Also I may update this as more stuff occurs to me.

    ---
    demerphq

      I'll try to address a few of your thoughts here.

      • The XML Node thread ticker should provide an option to provide a flat list directly.
        That sounds reasonable. I wrote my own routine to do it based on the threaded version, and this flattening is pretty inexpensive as compared to server hits, so to me it's not a huge issue, but in the name of completeness, yes, a "thread=threaded" or "thread=flat" option might be helpful. If such an option were there, I would have used it.
      • I'll look into making a RAT ticker.
        Thanks. The learning curve for my understanding your RAT code plus learning how to create an XML ticker would take me away from working on this module.
      • Most of my efforts... search internal code... synchronizing the pmdev server... correct use of private message ticker...
        If there's any feature my module should have, that would help you progress in these projects let me know.
      • I'm not sure if webscraping and XML parsing code should live in the same place...why...?
        Grabbing the site's XML tickers is trivial, and my module isn't really needed for that. My goal has been to take the site's XML tickers and use them to give the module's user useful datastructures. I guess I just wanted to give the module's users a solution that isolated them as much as possible from the details of the PM site implementation. Returning a datastructure instead of raw XML seemed like a good choice in that respect. But I'll listen to arguments to the contrary, and maybe add options to existing functions that would force a raw XML dump instead of the datastructure.
      • XP and level info can be obtained from the User XP XML ticker...
        I think I already used the right settings to grab that info. If not, you can always pass additional paramaters into the method that grabs the XML. I'm working on enabling pass-through to URL for such situations.
      • If you have issues that you need to workaround with a given ticker...
        I'll keep that in mind. For one thing, I would love it if the xml node thread ticker could be made to return titles too, perhaps with a "titles=1" option. That would make the Janitors Thread Retitler faster, since I wouldn't have to do a second hit to grab titles.

      Dave

      I'm certainly not claiming that this is the best way to do it (and I'm not really claiming anything at all...even I don't use this anymore)...but I once put in place an example of how to deal with either XML or HTML (via HTML::TableExtract) that worked pretty well, for the task and the time: PerlMonks::StatsWhore.

      As I said, that whole effort has fallen way by the wayside. I'm curious to see what you come up with in the sense of layering a common interface over the different methods of retrieval/parsing on the back end.

      Cheers,
      Matt

        My apporach would be and has been to eliminate the HTML from the equation. If people need/want to write client code and we don't provide a sensible way for them to get it via XML I would rather they ask for a proper ticker or feed than scrape the web pages.

        XML feeds are both lower load for the site and easier for people to utilize, and easier for us pmdevils to maintain. I am not even going to consider the possibility that something I do on site will break something that is parsing HTML (except for the CSS support I guess), but I will bend over backwards (and do backflips) to maintain backward compatibility for the XML tickers.

        Anyway, one thing I regret is that the PM XML tickers aren't easier to work with as a collection. Each one alone is useful but together they are pretty awkward. Thus an on-going project/objective of mine on site has been to try to rationalize the tickers in the hope that writing client code for them is easier. My fear of breaking clients has lead me to be cautious however, and in the end I decided to create a new (currently) secret ticker in an attempt to resolve a lot of this in a single node. Maybe its time to publicize it...

        ---
        $world=~s/war/peace/g

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://418295]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (8)
As of 2014-04-17 22:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (458 votes), past polls