Hello, Monks...
I am writing a small script to hashify an HTML table. The table is large, but completely homogenous (thank goodness). So, without further ado, I give you the html:
<tr><td><b><a
href=i386/zh-xcin-2.3.04.tgz-long.html>zh-xcin-2.3.04.tgz</a></b></td>
+<td>   
<i>chinese input utility for X
</i></td><td>[ <a href=ftp://ftp.openbsd.org/pub/OpenBSD/2.8/packages/
+i386/zh-xcin-2.3.04.tgz>FTP Site
1</a> ]</td><td>
[ <a href=ftp://ftp1.usa.openbsd.org/pub/OpenBSD/2.8/packages/i386/zh-
+xcin-2.3.04.tgz>FTP Site 2</a> ]</td></tr>
So, for simplicity I zapped the /n/r that was lurking in there and have something thats a big brick of html (which I will spare all of you, nobody ever said html was pretty). So I have the following code:
my @fields = split '<tr><td><b>', $input;
foreach my $field (@fields) {
# what i really wanted to do was...
# (undef, $names{$1}) =~ m// but that didnt work either
# so I added the $foo and $bar.
my ($foo, $bar) = $field =~
m!^<a href=.*>(.*)</a></b></td><td> {3}<i>(.*)</i>.*$!x;
$names{$foo} = $bar;
print "$foo == $bar\n";
}
If i print $field I do get my html, so I know $field is okay... I think the problem is the regex. In fact, im 90% sure its the regex. But where is it wrong given the data? It looks fine to me.
Thanks
brother dep.
--
transcending "coolness" is what makes us cool.