<?xml version="1.0" encoding="windows-1252"?>
<node id="898183" title="Re^2: Spliting a delimited string into variables" created="2011-04-07 16:03:15" updated="2011-04-07 16:03:15">
<type id="11">
note</type>
<author id="146981">
Popcorn Dave</author>
<data>
<field name="doctext">
I've got to second the vote for HTML::Parser or similar parsing engines. 
&lt;P&gt;
A long time ago, before RSS feeds, I wrote a program to parse various newspaper websites and did the regexes by hand. I had 24 different rules for 90+ papers. When I rewrote it, I got it down to 9 rules, mainly based on web page design, since I used a parsing engine.
&lt;P&gt;
You're going to save yourself a ton of work since if the data changes you're going to have to rewrite your regexes each time.
&lt;!-- Node text goes above. Div tags should contain sig only --&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-146981"&gt;
&lt;P&gt;
&lt;P&gt;
&lt;P&gt;
&lt;hr&gt;
&lt;font size = "-1"&gt;To disagree, one doesn't have to be disagreeable - Barry Goldwater&lt;/font&gt;
&lt;P&gt;

&lt;/div&gt;&lt;/div&gt;</field>
<field name="root_node">
898109</field>
<field name="parent_node">
898132</field>
</data>
</node>
