Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I am so impressed with this nifty module. It is awesome for finding "similar" documents, the way the old Excite for Web Servers search engine did (the "More Like This One" button).

The unusual aspect of this search technique is that searches become more accurate the larger the query is ... you can input the entire text of a document and the search engine returns a list of documents like it.

I made a modification to it so that I'd see a document ID in the command-line list of results (in addition to the filename), so that I can input the document ID in order to in effect provide all the terms in that document as the new query ... the result is awesomely accurate.

I'd love to have a web interface for this module and give it a try on a real site. I guess the first big obstacle is to turn the module into a daemon so that once all the vectors are created they could "hang around" without having to be recreated each time the search engine is used. Has anybody done any work in that regard?


In reply to Turning this module into a persistent web app by davebaker
in thread Refining a 'vector space search'. by Seumas

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-19 23:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found