Category: | Web Stuff |
Author/Contact Info | Juerd |
Description: | Because the popular gnuvd is broken, I made this quick hack to query the Van Dale website for dictionary lookups. It's a quick hack, so no production quality here ;) Oh, and please don't bother me with Getopt or HTML::Parser: Don't want to use Getopt because I don't like it, and can't use HTML::Parser because http://www.vandale.nl/ has a lot of broken HTML, and because regexes are easier (after all, it's a quick hack because I can't live without a Dutch dictionary). This probably isn't of much use to foreigners :) Update (200306081719+0200) - works with vandale.nl html updates now. |
#!/usr/bin/perl -w use strict; use LWP::Simple; my (@switches, @woorden); while (@ARGV) { $_ = shift; if (/^--$/) { push @woorden, @ARGV; } elsif (/^-/) { push @switches, $_; } else { push @woorden, $_; } } my $all = grep /^(?:-\w*a|--all)$/, @switches; if (grep /^(?:-\w*h|--help)$/, @switches) { print qq{ Usage: $0 [options] word ... options: -a --all List all matches -h --help Display usage information \n}; exit 0; } for my $woord (@woorden) { $woord =~ s/(\W)/sprintf '%%%02x', ord $1/ge; my $page = get "http://www.vandale.nl/opzoeken/woordenboek/?zoekwoord=$wo +ord"; while ($page =~ s{<B><BIG>(.*?)</font>.*?((?:<DD>.*?</DD>)+)}{}si) + { my ($woord, $betekenis) = ($1, $2); for ($woord, $betekenis) { s[</dd>][\n]gi; s/<.*?>//g; s/´/'/g; s/&#(\d+);/chr $1/ge; } $betekenis =~ s/^/ /gm; print "$woord\n$betekenis\n"; last if not $all; } } |
Back to
Code Catacombs