good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
Re^2: What DB style to use with search engineby halfcountplus (Hermit) |
on Nov 10, 2009 at 23:02 UTC ( [id://806358]=note: print w/replies, xml ) | Need Help?? |
When concatenating your files, remove all the newlines so that each file becomes a single line prefixed by the path information.
This makes searching for phrases that span lines much simpler and much, much faster.
Excellent point, thanks for that. Vis optimizing on the server, I do not have root access and that may be a hassle, but I will keep this in mind once everything is finalized. I am sure the regexp is feasible on this scale -- as it is now, most of the work is using regexps to parse the tags out, which must be done, and it still performs usably fast. No matter how I database the data, it should be way, way, way quicker with the tags preprocessed out.
In Section
Seekers of Perl Wisdom
|
|