When concatenating your files, remove all the newlines so that each file becomes a single line prefixed by the path information.
This makes searching for phrases that span lines much simpler and much, much faster.
Excellent point, thanks for that. Vis optimizing on the server, I do not have root access and that may be a hassle, but I will keep this in mind once everything is finalized.
I am sure the regexp is feasible on this scale -- as it is now, most of the work is using regexps to parse the tags out, which must be done, and it still performs usably fast. No matter how I database the data, it should be way, way, way quicker with the tags preprocessed out.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|