Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
The problem that you are trying to solve is traditional word based search which is probably more suited to a search engine than a SQL database. A typical search engine will allow you to index content such that an efficient word index is created in addition to a relational style table to store the meta-data for each document.

The search engine approach involves indexing the files as a preparation process followed by searching. The indexing part will require that the files are read (spidered) and filtered to extract content and metadata. Once the index is prepared, the search engine's query interface can be used to get reults.

Links to a couple of sites:
http://www.searchtools.com/index.html
http://searchenginewatch.com/


In reply to Re: Help on building a search engine (design / algorithms / parsing / tokenising / database design) by inman
in thread Help on building a search engine (design / algorithms / parsing / tokenising / database design) by bobtfish

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (6)
As of 2024-05-20 15:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found