Re: Newest Super Search
by vladb (Vicar) on Aug 03, 2002 at 00:36 UTC
|
For those uninitiated ones, could you tell me which Perl modules are you using to do the search and indexing of data? Whenever I had to do tackle similar search issues, I simply resorted to the use of the DBIx::FullTextSearch module. It's a pretty versatile module and my thinking is it could be used even to implement PM search. This is only a suggestion, however. And also I'm aware that for 2 cents to be worth a dime, I'll have to show some sample code of how this module could be 'incorporated' into the PM engine. ;-)
_____________________
# Under Construction
| [reply] [d/l] |
|
The only modules are DBI, CGI, and Everything. Most of the actual code
is included below.
This is a pretty specialized situation. For example, we can't allow any
single query to run very long since MySQL was designed assuming some
aspects of the threading model (light-weight processes) that aren't
present on FreeBSD. So a single long-running query can nearly lock up
access to the database for everyone.
| [reply] [d/l] |
Re: Newest Super Search
by hossman (Prior) on Aug 03, 2002 at 01:56 UTC
|
Super Search seems almost too super now ... i'm definitely overwellmed by all the options.
the newest/oldest thing is definitely a little weird,
especially since it seems like it allways wants to start
with node 1 ... even if you are starting with the newest
(and even if you pick a high number, it still seems
to count up)
As for Discussions appearing at the top now, I've been
wondering why people can't customize which order the
sections appear in for themselves? (similar to customizing
what order Nodelets appear in)
| [reply] |
|
Well, I tried to design it so that much of the time you can just fill in what you are searching for in that first box and then hit the button. But there needs to be some major work on how the results are presented after that.
As I mentioned in the announcement, you can't search "newest first" yet. Those radio buttons are supposed to be "disabled" (they show greyed-out in my browser). I'll probably just remove them until (if) I get
that part working. (And the "starting at node ID" field will probably disappear as well.)
Yes, customizing order on Newest Nodes has been discussed and will probably happen at some point. (:
- tye (but my friends call me "Tye")
| [reply] |
|
tye: Any idea if/when the 'Newest First' can/will be implemented? I find that I almost always want newest first, but I can't have it! Will it put too much load on the DB to stick the ORDER BY ??? DESC in there? If it will never be implemented, then at least take away the tantalizing 'greyed-out' Radio Buttons, so I don't have to constantly be reminded that my searches always come back in the wrong order...
Thanks!
--
3dan
| [reply] |
Re: Newest Super Search
by dimmesdale (Friar) on Aug 03, 2002 at 19:57 UTC
|
Very good.
I have a suggestion, though. Would it be useful to have some special tag (say <keywords> or <index> or some such) to be able to define a set of keywords to be used in indexing the node. Or maybe you could specify searching just that. I'm thinking of something like in (La)Tex where you can specify words/references to use in the index and it will auto-create one for you.
What I'm getting at is something like the meta keywords/description allowed on web pages that search engines such as google might use.
Or maybe there could be a set of certain keywords (e.g., a node might fall under Web -> CGI -> cookies), and then you might be able to have a separate page with links corresponding to those keywords (like a directory). And maybe if some people are interested, editors (or some other group) might have the ability to go around indexing nodes where the author didn't (past nodes, for example) so its easier to search. (this would be something like the Google/Yahoo Directory pages). The keywords would be pre-defined, but might be open to additions?
Well, these are all just ideas I'm throwing out... maybe none of them are good? It just seems that if someone specifies what a node is about in a concrete way it will return better results than just hoping a certain word appears in it. | [reply] |
Re: Newest Super Search
by danichka (Hermit) on Aug 03, 2002 at 06:29 UTC
|
How often are the User nodes cached? Just wondering because I search them when I get bored (which is pretty often).
I am glad to see a checkbox for PMD too. I don't remember there being one on the old Super Search.
use Your::Head; | [reply] |
|
| [reply] |
|
Ok, I figured out what my problem was. Super Search will return results for things that are listed as comments in the HTML. Then I wouldn't see that text on the page and thought it was searching a cached version of the User pages. If I had actually taken the time to view the source before I would have realized this a while ago.
use Your::Head;
| [reply] |
|
Re: Newest Super Search
by tjh (Curate) on Aug 03, 2002 at 16:32 UTC
|
This is excellent. Thank you!As an aside, I just searched for "IDE" (without quotes), " IDE " (sans quotes), and " IDE " (with quotes). Note the attempts to include leading/trailing spaces. In all cases the results included many (most) pages with no standalone IDE in them, but as word parts. I don't know anything about Everything :) so don't know what it looks like in the query, plus, as usual, I can get the data other ways (Google, different terms, etc.) but thought I'd point it out. Thanks again! It is very flexible and fast, much appreciated. | [reply] |
|
To quote from Super Search:
Match text containing [______________________________]
(seperate strings with [__] -- default is spaces)
and then, quoting from Super Search results after each of your above search attempts:
where any text contains all of "IDE"
where any text contains all of "IDE"
where any text contains all of """, "IDE", """
So separating search terms on space (the default) gives us search terms for each of your attempts of ('IDE'), ('', 'IDE', and ''), and ('"', 'IDE', and '"'). But, of course, we ignore empty search terms. That explains the results above. The code that does this is even posted in public1. (:
So if you want to search on spaces, you have to make it so spaces don't seperate search terms. In the second field just enter any character (or string) that doesn't appear in your single search string.
There are no "special characters" when searching on strings. Backslashes, spaces, quotes, stars, parens, brackets, ampersands, etc. all just match what you type (case is ignored). You pick the separator you want. The default is space as people are used to listing several space-separated words. Searching for punctuation is important on a Perl site so the search doesn't tie up any characters as being "special" other than the delimiter string you choose.
As I've already said, some help pages need to be added.
I also hope to add support for some limited regular expressions. MySQL supports regular expressions and I had initially just assumed that they were rather minimal regular expressions that would be quick to run (this being a database server and all), but they are full-blown regular expressions from Henry Spencer. This means that entering a (perhaps intentionally) "bad" regular expression could burn a lot of SQL daemon CPU time (the strings being matched against are node contents which can be quite long leaving lots of room for pathological backtracking to get way out of hand).
So to support regular expressions I'm going to have to pick a subset of regular expression features (or perhaps some other wildcarding scheme such as what glob uses, ...) that I can translate to MySQL regular expressions so that I can be sure that no one can enter a "pattern" that locks up the site.
I don't know anything about Everything :) so don't know what it looks like in the query
But that is easy enough to find out. Simply "view source" on the results page and search for "SELECT" and there you'll see (one of) the query all spelled out in full SQL. :)
(Minor updates applied.)
- tye (but my friends call me "Tye")
1 Leading and trailing spaces are stripped from the separator string (but not from the search string). And an empty separator string is interpretted as being a single space instead. This is because spaces don't really "show" on the web page.
So you can't set your separator to be, for example ", " nor " + ", but you can search on those strings so long as your separator string doesn't conflict.
| [reply] [d/l] [select] |
|
Perfect. I changed the separator to something else and searched for the <space>EDI<space> string and got quick and accurate responses.Thanks some more.
| [reply] |
Re: Newest Super Search
by blakem (Monsignor) on Oct 12, 2002 at 08:56 UTC
|
I didn't find this documented anywhere, so....
In Super Search you can specify authors by their homenode id using the id://83485 form. This is the only way I found to distinguish between io and I0.
-Blake
| [reply] [d/l] |
|
| [reply] |
Re: Newest Super Search
by schumi (Hermit) on Aug 15, 2002 at 09:40 UTC
|
I bow before thy work!
I have a humble question, though. I just searched for something, got the first results, decided they looked promising, but wanted to go on searching. So I pressed "Next". For the next two search-parts (legs? attempts? How do you call these smaller batches?) I didn't get any results, only for the third and last attempt. These results looked promising as well, but - my initial results had gone.
Is there an easy way in which the results of the part-seraches can be kept? If not, one could always help onself by opening the results in new browser-windows/-tabs before hitting "Next" again.
--cs
There are nights when the wolves are silent and only the moon howls. - George Carlin | [reply] |