Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

As part of the More HTML Escaping roll-out, the simple search (at the top of each page or via the "node" CGI parameter) was switched away from using MySQL's "full text search" feature. This meant that you could once again search for 3-letter and 2-letter words in node titles. This version avoided the "worst case" situations of the servers sorting through way too many matches but would not find any matches unless all of the words entered matched.

I've just rolled some more improvements into the simple search. The current implementation works like this:

If an exact title match is found (after ignoring nodes that you don't have permission to read unless you have changed your user settings), then no further searching is done.

Otherwise your search string is split on whitespace resulting in a list of "words". We look for nodes that contain the greatest number of your "words" in their titles as simple substrings. Titles that match this maximal number of words are listed, newest first. That is, if you specify 5 words and there are no titles that include 4 or more of your words but there is a title that contains 3 of your words, then you will only be shown titles that contain 3 of your words.

If there are more than 500 such matches, then the oldest 500 are listed (newest first). In future it should change to showing the newest 500 matches but that requires a database change to work around a subtle bug in the MySQL optimizer.

Future changes to the Search results display code will probably reduce clutter by hiding most of the information about replies if a large list of matches was found.

Note that 1-character words must be surrounded by whitespace in the node title for them to match (so / c finds C Client / Perl Server incompatibility and its replies but little else -- note that the ends of titles count as whitespace).

Also, there are no "stop words". A search for perl script takes about the same time as a search for something much more specific.

More flexibility will be available via Super Search when it gets rewritten (hopefully RSN).

        - tye (but my friends call me "Tye")

In reply to New simple search by tye

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (5)
    As of 2019-12-12 02:50 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found