Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^6: New Feature For Simple Search

by jdporter (Paladin)
on Jan 05, 2009 at 21:55 UTC ( [id://734291]=note: print w/replies, xml ) Need Help??


in reply to Re^5: New Feature For Simple Search
in thread New Feature For Simple Search

Thanks for the update. I agree that it is discouraging to encounter "institutional resistance" when you're trying to do something good for the community. I just had a chat with one of the gods (Petruchio) and we came up with an approach which may be sufficient for you (or whomever) and acceptable to the powers, wrt privacy and load. Namely:

Periodically (perhaps once per day) a database dump will be cut which includes only publicly visible data, in SQL. This file will be available at a static address, and projects such as yours can download it and load it into their databases. This will not only be much easier than trying to spider the database through the XML-based 'API's, but also put much less strain on our server resources.

I for one have wanted such a thing for a long time, myself. Petruchio suggests that the demand for it preceed the supply: "Get a project going, and people [meaning gods] will do what's needed to make it happen." (quoted without permission ;-)

The PerlMonks powers-that-be wouldn't grant access to node rep, which would have been highly useful in improving relevancy and user satisfaction

Yeah, that just isn't going to happen, ever. But it really should not be considered a project killer.

Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.

Replies are listed 'Best First'.
Re^7: New Feature For Simple Search (Kino)
by tye (Sage) on Jan 06, 2009 at 05:33 UTC
    [jdporter]: Periodically (perhaps once per day) a database dump will be cut which includes only publicly visible data, in SQL.

    Oh, please, no. I've disabled most of those style of jobs (written by/for jcwren) because they were causing lots of problems (and I haven't heard any complaints, but I expect the recurring and/or continuing outage(s) at perlmonk.org may be playing a role in that). I have it on my to-do list to fix the node cache so that such things don't need to be memory pigs (because we at least still want one for vote allocation). But even after that gets fixed, this makes for a rather poor experience in so many ways.

    After I read remarks like

    [creamygoodness]: so I have less fungible time in general,

    I got the impression that the whole project was scrapped. But if there is going to be an attempt to rekindle it, then I'll make the effort to correct several misrepresentations.

    one of the gods chatterboxed me a while back that he didn't want the maintenance burden and preferred Google

    It seems likely that I'm the one of the gods, though I can't claim that for certain. I do recall CB conversation(s) with creamygoodness about this project. Most of his representations that appear to have come from such certainly seem rather far from the mark from my perspective. But I'm not all that surprised based on my recollection of the conversation(s).

    I'm disinclined to spend it fighting institutional resistance at PerlMonks

    I like this choice of words, because it segues nicely into my main impression of the conversation(s). I recall quickly getting the impression that creamygoodness was inclined to "pick a fight" when not given the exact desired answer. That got rather tiresome so I wasn't inclined to spend the time, effort, patience to try to counter things and try to ensure effective communication (when wrong conclusions were jumped to and then groused about more than once, for example). So this is why I'm not surprised to see creamygoodness misrepresent (unintentionally, surely) so much of what I think about such a project, some of which I recall at least hinting at in the CB.

    preferred Google

    Well, perhaps in some ways. I recall more clearly saying that I wanted to keep Super Search around because I value some of the things it does (that neither Google nor even a custom KinoSearch will do). I have repeatedly acknowledged the not-subtle short-comings of Super Search and would be happy to make it not the default solution presented to newcomers (though we have just done that, which I find a mixed blessing that I hope we will quickly improve upon, but that isn't what this node is about). I've also made several changes to make Google work much better with PerlMonks and I had and have plans for even more improvements. And I see no reason to not continue to make Google (and probably other search engines) work well. But I certain believe that KinoSearch can't replace Google, if for no other reason than many searches that find PerlMonks don't start at PerlMonks.

    But I would definitely like something that behaved more like Google (very fast, word-based, full-text) that also had some features of Super Search (allowed filtering based on author, section, date, etc.) and that PerlMonks had more control over (to ensure the index was 100% complete and up-to-date and used PerlMonks-friendly token rules, etc.). And KinoSearch seems the perfect tool for making such (especially since PerlMonks is a "Perl site" in several senses).

    he didn't want the maintenance burden

    Actually, my position is nearly the opposite. But I'll get to that. I certainly refused to spend a lot of time building a sanitized DB dump (I've seen several of those built over the years for different projects and most instances ended up with mistakes made, sometimes caused problems, and I think most of the projects never materialized and those that did have mostly already disappeared). So I pointed out that the project can get the ball rolling with data spidered from the site. If the project actually gets off the ground and produces something useful, then it becomes much easier to find people motivated to contribute PerlMonks resources.

    I don't think I got past that to attempt to convey that producing a proof of concept and working out a lot of the details and demonstrating momentum could then be followed by switching to hosting the project at PerlMonks. That is where I would really prefer things to end up, for a lot of reasons.

    wouldn't grant access to node rep, which would have been highly useful in improving relevancy and user satisfaction

    Well, I think "highly useful" greatly overstates things. But the right way to get access to node rep is to host the index at PerlMonks. Then the index can have access to node rep without having to expose node rep externally (which I'm not convinced would never happen, but I don't see a need for it when the better solution is better in so many ways).

    Perhaps many of y'all already don't remember "thepen", a Google-friendly mirror that blakem set up and ran for a while. And perhaps many of y'all have forgotten many, many "CB log" sites that have come and gone over the years. So I really don't want to see a full-blown KinoSearch mirror of PerlMonks hang around for a few years before it just disappears like so many similar projects. And for as much flack as PerlMonks deservedly gets for the issues it has, it is still "available" more of the time than many the various types of mirrors that haven't disappeared yet.

    So, yes, build a separate KinoSearch that works using its own database and web pages. But plan for that to get installed onto the PerlMonks servers so that new nodes and node updates can be efficiently and quickly funneled off to it to be re-indexed. Spider some nodes to have the data for figuring out the devils that are in the details about how such a thing should work. All of the public data is already exposed in XML tickers so it just makes sense to use those as a starting point where you don't have to wait for some other data delivery to be arranged. But plan for hooks that can be replaced so that PerlMonks can be the one to trigger updates to the index.

    Integrating tightly with PerlMonks sucks. We have dozens of members of pmdev but I think we have 4 people who I would consider even marginally active (jdporter and a few gods), and I understand lots of reason why that is how things tend. So one should certainly avoid tightly integrating with PerlMonks when practical. And most of the time that leads me to suggest that projects be implemented outside of PerlMonks (and that this project is best to start from the outside). But I would really like a streaming chatterbox server and a decent index to be hosted on PerlMonks hardware (donated generously by pair.com, let us not forget), though still only loosely integrated with the site itself (and a new CB 'log' might just get built inside PerlMonks, one day).

    Finally,

    We present the noobs with a prominent but sucky site search and then berate them for their ignorance when they ask a question that's already been answered. Why? To make ourselves feel smarter?

    To paraphrase Lt Col Kilgore: Newbies don't search. Yes, "to make ourselves feel smarter" is exactly the reason, of course, how did you know? Newbies armed with Google most often don't find their answers either. KinoSearch will not lead to the vast majority of newbies searching before posting, much less finding the answer and not re-posting. But I mainly quoted this paragraph because it reminded me of the attitude that turned me off from continuing to contribute suggestions to the project in the first place.

    - tye        

      tye,

      Yes, you were the god I was referring to.

      I recall quickly getting the impression that creamygoodness was inclined to "pick a fight" when not given the exact desired answer.

      Naturally, I think this is an unfair characterization and I dispute it. I don't think the cumulative record of my posts at PerlMonks bears it out, and I don't think it accurately portrays my chatterbox history, either.

      Nevertheless, I'm dismayed that you took away that impression from our many conversations, because you've been one of the most helpful chatterbox denizens, particularly on XS questions -- even when we've disagreed. I recall that exchanges have often flowed like this:

      creamygoodness: How do I do X?
      tye: Like this...
      tye: But don't do X, do Y instead.
      creamygoodness: Thanks! (:
      creamygoodness: FYI, I'm still going to do X, and here's why...

      It's true that I haven't always applied the exact solutions that you've advocated, and that I have often bailed out of the chatterbox before successfully persuading you that X was superior to Y for the application at hand. (Past discussions of Module::Build vs. MakeMaker and handling of labeled args via XS spring to mind.) But you've often provided me with the necessary background to make an informed decision, and I'm grateful for that.

      I also recall a chatterbox exchange (but can't provide a transcript, natch) where you advanced the arguments in favor of Google site search over a custom coded search. I've had that conversation many times with many people, because deploying Google site search is a popular way to provide a low-maintenance search option. You made the case well, and I took away the impression that you were committed to that approach.

      That got rather tiresome so I wasn't inclined to spend the time, effort, patience to try to counter things and try to ensure effective communication

      Well, I suppose that explains how some miscommunications and mistaken impressions snowballed.

      I just thought you were just a loveable, laconic curmudeon, dispensing useful advice to the young'ns then alternately grumbling or stewing when they didn't follow it to the letter... Hmm, I still think that. ;)

      I don't think I got past that to attempt to convey that producing a proof of concept and working out a lot of the details and demonstrating momentum could then be followed by switching to hosting the project at PerlMonks. That is where I would really prefer things to end up, for a lot of reasons.

      I appreciate the clarification. Indeed, this is exactly the opposite of what I thought.

      The spidering interface seemed such a clumsy way to get a one-time database dump, it inspired all manner of idle speculation. For instance: Was the back end of PerlMonks so deeply screwed up that supporting custom-coded search was really going to be a nightmare and thus clearly justifying your [apparent] preference for Google site search?

      It looked to me like you had made a perfectly reasonable engineering decision to outsource search to Google, and that it was therefore going to be difficult to get a custom coded search accepted and hosted at PerlMonks.

      the right way to get access to node rep is to host the index at PerlMonks.

      Agreed.

      So, yes, build a separate KinoSearch that works using its own database and web pages. But plan for that to get installed onto the PerlMonks servers so that new nodes and node updates can be efficiently and quickly funneled off to it to be re-indexed.

      Good plan. As mentioned previously, I don't have a lot of time right now to work on anything other than the KS core. Like the gods, I have pending patches to evaluate and apply... But it's good to know that if the project rekindles and demonstrates its usefulness, it has a decent shot at integration.

      --
      Marvin Humphrey

        Thanks for the reply. I'm glad things finally got more straightened out. Sorry, I don't remember why I prefered Google at that point, surely some changed assumptions on my part are at least partially to blame. :)

        Here's to wishing the project well. The improvement would be widely felt.

        - tye        

      ...the project can get the ball rolling with data spidered from the site. If the project actually gets off the ground and produces something useful, then it becomes much easier to find people motivated to contribute PerlMonks resources.

      Thank you. That is essentially identical to what Petruchio told me yesterday.

      Now I'm wondering if anyone out there has already got a node cache going, off of which other project could leech...

      Between the mind which plans and the hands which build, there must be a mediator... and this mediator must be the heart.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://734291]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2024-04-16 06:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found