in reply to PerlMonks::Mechanized (beta)

Heh, well, this has been a back burner project for me for quite some time. Ive focused more on the tickers, and the various XML feeds, and nothing on the HTML side. But it looks like youve gotten there first and have a nicer package than I did. Instead of pursuing my efforts Ill start looking at patching this. I actually wonder if there is any point in putting this onto Sourceforge or something so that it can be worked on collaboratively more easily.

Anyway, nice work, I look forward to playing with it later on. :-)


Replies are listed 'Best First'.
Re^2: PerlMonks::Mechanized (beta)
by davido (Cardinal) on Dec 30, 2004 at 16:25 UTC

    Putting it on Sourceforge might not be a bad idea. I'll look into that in the next day or two.

    A few thoughts for things I plan to add or change in this module:

    • Add a Recently Active Threads parser. This will be easier once a RAT XML ticker is available. Unless someone else gets to that first, I'll attempt it eventually.
    • Add a normal displaytype (non-XML) node grabber, possibly combined with an HTML token parser.
    • Add interoperability with some of the Monastery's nodelets.
    • Fix the Janitor class so that it is smart enough to know the difference between different types of nodes. For example, currently if you edit the doctext field of a Code Catacombs type node, you're actually editing the code display segment, not the description segment.
    • This would be a pretty major change, but I may make PerlMonks::Mechanized a proper subclass of WWW::Mechanize, instead of its current 'uses' relationship.


      Some quick thoughts:

      • The XML Node thread ticker should provide an option to provide a flat list directly.
      • Ill look into making a RAT ticker. I have a feeling expecting anyone else to do it is kinda like asking them to pull out their own toenails. :-)
      • Most of my efforts have been regarding things like search internal code, and facilitating such tasks as synchronizing the pmdev server with the master. Likewise with correctly using the private message ticker, et all.
      • Im not sure if webscraping and XML parsing code should live in the same place, im not saying it shouldnt Im just wondering why it should. :-)
      • Support for multiple users?
      • XP and level info can be obtained from the User XP XML ticker with the correct option specified.
      • Generally speaking I would prefer that if you have issues that you need to workaround with a given ticker that you let me know and we change the ticker.

      This is stream of consciousness stuff here, so take it all with a grain of salt. Also I may update this as more stuff occurs to me.


        I'll try to address a few of your thoughts here.

        • The XML Node thread ticker should provide an option to provide a flat list directly.
          That sounds reasonable. I wrote my own routine to do it based on the threaded version, and this flattening is pretty inexpensive as compared to server hits, so to me it's not a huge issue, but in the name of completeness, yes, a "thread=threaded" or "thread=flat" option might be helpful. If such an option were there, I would have used it.
        • I'll look into making a RAT ticker.
          Thanks. The learning curve for my understanding your RAT code plus learning how to create an XML ticker would take me away from working on this module.
        • Most of my efforts... search internal code... synchronizing the pmdev server... correct use of private message ticker...
          If there's any feature my module should have, that would help you progress in these projects let me know.
        • I'm not sure if webscraping and XML parsing code should live in the same place...why...?
          Grabbing the site's XML tickers is trivial, and my module isn't really needed for that. My goal has been to take the site's XML tickers and use them to give the module's user useful datastructures. I guess I just wanted to give the module's users a solution that isolated them as much as possible from the details of the PM site implementation. Returning a datastructure instead of raw XML seemed like a good choice in that respect. But I'll listen to arguments to the contrary, and maybe add options to existing functions that would force a raw XML dump instead of the datastructure.
        • XP and level info can be obtained from the User XP XML ticker...
          I think I already used the right settings to grab that info. If not, you can always pass additional paramaters into the method that grabs the XML. I'm working on enabling pass-through to URL for such situations.
        • If you have issues that you need to workaround with a given ticker...
          I'll keep that in mind. For one thing, I would love it if the xml node thread ticker could be made to return titles too, perhaps with a "titles=1" option. That would make the Janitors Thread Retitler faster, since I wouldn't have to do a second hit to grab titles.


        I'm certainly not claiming that this is the best way to do it (and I'm not really claiming anything at all...even I don't use this anymore)...but I once put in place an example of how to deal with either XML or HTML (via HTML::TableExtract) that worked pretty well, for the task and the time: PerlMonks::StatsWhore.

        As I said, that whole effort has fallen way by the wayside. I'm curious to see what you come up with in the sense of layering a common interface over the different methods of retrieval/parsing on the back end.