Actualy, that is what I want it for. I've got code
that's parsing apart web pages to extract data, and I
want it to know when it's not extracting the data correctly.
The prototype regex code is used so the
parser can generate a 'signature' (regex) that describes
the data to be extracted (based on
an example set) which it
can use to validate that further information matches the
same 'signature'.
As far as the parsing logic, we're using landmark based
location identification. Move forward past 'New Questions',
move forward past 'lastnode_id',
move forward past '>', extract to '<'. And so on...
Kyle
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|