Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Feed / Tag Cloud of Recent Searches from Super Search

by jrtayloriv (Pilgrim)
on Feb 15, 2008 at 00:07 UTC ( [id://668067]=monkdiscuss: print w/replies, xml ) Need Help??

I'm sorry if this idea has been brought up in the past, but I searched around and didn't find anything similar.

In short, I'd like to be able to have access to a feed of the most recent queries made by other people in Super Search. Additionally I think it would be nice to have a link to a set of tag clouds that contain the most popular search terms for the day, week, month, and so on. The tag clouds however, could easily be implemented externally by people who wanted them, from the data provided by the feed, so it is of secondary importance.

Why would I want this? Well, I am a novice programmer who would fit into the self taught hack category. And because I have nobody telling me what to learn -- i.e. I don't have a planned progression of study topics as I would in a institutional learning environment -- I often reach points where I don't know which topics to explore next. Very often, I find my next "subject of the month" when I'm browsing through a discussion taking place in someone else's posts, and happen upon a term that I've never heard before. I go to look it up, and a whole new world is opened up to me. I'd like to be able to gain inspiration for new subjects to research from other people's search queries as well.

For instance, here are a few queries that I've made recently:
  • "programming paradigms"
  • "logic programming"
  • "dispatch tables"
  • "model view controller"
  • "design patterns"
Right about now, you wizards are probably all thinking "Yeah, so what? I already know all about that stuff!" But imagine if you had never heard of "programming paradigms" -- i.e. you had no idea that your programs all fit into the imperative programming paradigm, and that there were other ways of solving problems. Would you have ever thought to search for something like this? How would you have known which terms to use in your query? Or imagine that you'd never heard of "model view controller" or "design patterns". Think of all the interesting concepts that would come into your field of vision if you ran a search for these topics, when you became curious after seeing that someone else was searching for them. Many of these topics won't get brought up in new posts often, since there is already enough information contained in the archives that most questions don't need to be asked anymore. For most of us, our quest for knowledge ends with a search query, not a post. It would be nice to get a glimpse of the things that are of interest to people here, which are not being discussed in the public forum (because they already have been before).

Plus it's always nice to know what's going on in the hive mind -- i.e. what's hot & what's not. And it will provide endless hours of entertainment for me when I'm bored, and the chatterbox is silent. :)

As far as implementation, I've only been able to come up with a few ideas, but the site maintainers are going to be in a much better position to figure these sorts of things out than I am.

First off, it is obvious that information about who actually ran each query should not be available at all. This would be a violation of privacy, and would not serve a valuable purpose anyhow.

Second, I think that the tag clouds and such should go on their own separate page, so that they don't add any unnecessary noise to the Super Search page. At most, perhaps a link at the very bottom of Super Search

As far as how each query would be represented in the feed, it seems pretty straightforward -- just have the value submitted for each field of the Super Search form contained in a separate tag, along with a timestamp for the entire query.

Well, that's the idea I was tossing around today. Is there any reason that this is not convenient/desirable/possible to implement? If it is something that others would like to see, how would you make it better?

---jrtayloriv
  • Comment on Feed / Tag Cloud of Recent Searches from Super Search

Replies are listed 'Best First'.
Re: Feed / Tag Cloud of Recent Searches from Super Search
by jrtayloriv (Pilgrim) on Feb 15, 2008 at 06:58 UTC
    I've been giving this tag cloud thing more thought, and I'm realizing that it's not only quite difficult to implement, but it's not really necessary anyway. It wasn't a very well thought out idea on my part. I should have thought it out more beforehand -- it's not very realistic.

    The only reason I found it desirable in the first place, was to show "the most popular search terms for the day, week, month". Why do we need "tag clouds" to do this? I'm pretty sure now that we don't. Why not just have a sorted list?

    But even then you've got problems. How do you determine which ones are the most popular? "RDBMS", "databases", "mysql", "postgresql" are all the same topic, and it would make sense to group them together when determining that databases were a popular search term today. But I don't have any idea how to teach the newfangled "popularity machine" how to understand this. As you said, this is starting to cross over into the land of NLP, and I can't come up with a solution myself -- I just don't have the experience or know-how to do such things.

    But I still think that the search history feed would be a useful feature, and it is much simpler to design and implement, and won't require a full time staff of linguists and AI experts to maintain ... and I really would enjoy watching the recent searches in a nodelet.

    --jrtayloriv

    UPDATE: I'm not saying the "tag cloud"/"popular searches" feature is not possible, by the way, just that I don't have any idea how to accomplish it, and that it would probably be better to get the search history feed up and running first, and then worry about more complex functionality built on top of it. If someone else sees a good way to do it, please share it.
      I'd say the fact that you have no idea how to accomplish it ( neither do I ) would mean that you should give it a shot. I think though that instead of a tag cloud, you might want to check out Google Zeitgeist which might be a viable option. I think that with Perl in particular, seeing what people are looking for would help since with a freer form language like Perl, one needs to look at the average choices as well as the total number of choices...
        I might give it a shot, but first I'd need the data to make the list with. It would be better to start out simple -- i.e. get the search feed implemented, and get the kinks worked out -- and then start trying to build fancy things on top of it. It's going to be hard to take a shot at creating a list of popular searches if I can't see what people are searching for....

        What do you mean by your last statement? I didn't understand what you meant by average choices vs. total number of choices -- could you clarify that?
Re: Feed / Tag Cloud of Recent Searches from Super Search
by ww (Archbishop) on Feb 15, 2008 at 00:57 UTC
    ++ for an interesting idea.

    However, your penultimate question, "Is there any reason that this is not convenient/desirable/possible to implement?" goes to the nub regarding implementation of this (and many other) good ideas.

    ...convenient?
    For whom, for what values of "convenient?" I suspect implementing this within PerlMonks existing (and complex) framework would be arduous (as one possible antonym for "convenient). Moreover, developing a schema for tagging querries, without requiring the user to provide the tags, comes close to requiring natural language parsing, an endeavor which has already proven more than merely arduous. And if the requirement is that the users of SS provide the tags, you have

    1. a chicken-and-egg problem (Many who SuperSearch may be unable to accurately tag with words or phrases of which they have no knowledge) and...
    2. a massive re-education/culture-change task ahead, as tagging is not a part of this culture.

    ...possible?
    The answer(s) will have to come from the wisest of the Monastery's wise.

    That said, this rambling response is in no sense meant to cast ice water on your idea. My breadth of ignorance (and utter lack of any CS background) make it appear to me to have merit. Thanks for an intriguing proposal.

      Forgive me if I'm misunderstanding you -- but I don't see what you mean by saying that developing a schema for tagging queries would be difficult. Would something like this not work:
      <query> <date> <month>02</month> <day>14</day> <year>2008</year> </date> <time>0830</time> <match_text_containing>the search query</match_text_containing> <match_text_separator></match_text_separator> <match_titles_containing></match_titles_containing> <match_title_separator></match_title_separator> <match_exclude_authors>exclude</match_exclude_authors> <authors> <author>Anonymous Monk</author> <author>jrtayloriv</author> </authors> <sort>new_first</sort> <result_date_start> <month></month> <day></day> <year></year> </result_date_start> <search_sections>all_but_selected</search sections> <sections_selected> <section>sopw</section> <section>meditations</section> </sections_selected> <skip_text_containing>stuff they didn't want</skip_text_containing> <skip_text_separator></skip_text_separator> <skip_titles_containing></skip_titles_containing> <skip_title_separator></skip_title_separator> <include_root_nodes>yes</include_root_nodes> <include_replies>selected</include_replies> <match_exclude_parent_author>exclude</match_exclude_parent_author> <root_node_authors> <author>Anonymous Monk</author> <author>jrtayloriv</author> </root_node_authors> </query>


      I've been overly verbose for illustration, and there is probably a better way to set this up -- but something as simple as this would seem to work. I don't see why anything would have to be done by the user -- i.e. I don't see why they would be required to tag anything themselves. This could happen transparently whenever a search request is answered by the web server, using the values the user input in the search form, and seems to be trivial to implement. It would seem that the resources expended to save the submitted queries in this format would be trivial compared to the server resources used to perform the search itself.

      Am I missing something? Sorry if I'm being stupid -- I'm not very knowledgable about this sort of thing, and it's likely that there is something simple I didn't understand...

      --jrtayloriv

      UPDATE: Elaborated a bit on how the search queries might be formatted in the feed ...

        Ummmm....errrrrr.....
        No I definitely do NOT think you're being stupid. My selection of "schema" was poor. Maybe "scheme" -- since that has fewer specialized overtones -- would have been better...and there's probably something better yet.

        And your comments above must have unlocked a synapse or two. But if you are suggesting that the "tag cloud" be generated from the content of the nodes found by SuperSearch, the issue I raised seems to me to be unresolved: What tools would we use to determine which parts of the returned_content should be applied as tags?

        <Grin>... and if we continue this discussion, why -- you may have written the necessary code by the time I attain even minimal understanding of how "convenient" such development would be. In that (happy) event, I expect your elevation to Priest or Abbot or even Cardinal to occur precipitously... concurrent with an invitation to become a Dev(il).

Re: Feed / Tag Cloud of Recent Searches from Super Search
by Withigo (Friar) on Feb 20, 2008 at 06:35 UTC
    I think a tag cloud would be a great idea!
    I can't count the number of times I've sifted through all the pages returned by super search, looking for very specific topics. I add my favorite threads to my scratch pad, and then try to organize them later. I know many monks do the same, so there's already a personal effort at tagging going on.
    The store of knowledge here on PM is vast, and it should be a goal of this site to make it easy for people new to perl to be able discover topics which they didn't even think of searching for.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: monkdiscuss [id://668067]
Approved by ww
Front-paged by jdporter
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-19 19:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found