<?xml version="1.0" encoding="windows-1252"?>
<node id="283543" title="Typoxy: A Pox on Typos by Proxy" created="2003-08-13 09:41:57" updated="2005-08-12 22:04:38">
<type id="1042">
CUFP</type>
<author id="249603">
halley</author>
<data>
<field name="doctext">
I admit it, I'm one of those annoying people who will interrupt a conversation to point out spelling errors.  I try to be discreet and mean no offense; my parents raised me to prefer a friendly correction once than to make the same mistake in more important settings.

&lt;p&gt;However, blogs and other online forums are typically filled with egregious and repetitive and predictable errors.  If only I could hide the errors from my browser, I would remain mellow and calm while the rest of the world's grammar decline went unchecked.

&lt;p&gt;I pondered aloud to some friends about the best way to put a search-and-replace filter into my favorite web browser, and somebody suggested [cpan://HTTP::Proxy].

&lt;p&gt;This is scratch code without documentation.  I've only tested this on Linux.  The simplistic filter has trouble in rare cases where a typo is found inside tag attributes.  I filtered a word processor's auto-corrections file and added a few blog-common errors myself.  I stripped out any non-ASCII fixes for simplicity.  To use it, run this &lt;code&gt;typoxy&lt;/code&gt; proxy in the background and configure your browser to access the web through it.

&lt;readmore&gt;
&lt;code&gt;
#!/usr/bin/perl

use strict;
use warnings;
#use Data::Dumper;

my $Port = 8080;
my $Highlight = 1;

#----------------------------------------------------------

my $Pre = $Highlight?
    '&lt;span style="background: #ffffcc; color: #800000"&gt;' : '';
my $Post = $Highlight?
    '&lt;/span&gt;' : '';

my @Typos = ();

open(TYPO, "$ENV{HOME}/.typo") and do
{
    @Typos = ();
    while (&lt;TYPO&gt;)
    {
        chomp;
        my ($wrong, $right) = split /\t+/;
        next if not $right;
        next if length($wrong) &lt; 2;
        push(@Typos,
             [ $wrong, $right ]);
        push(@Typos,
             [ ucfirst($wrong), ucfirst($right) ])
            if ucfirst($wrong) ne $wrong;
    }
    close(TYPO);
};

die "No typos loaded from ~/.typo.\n" if not @Typos;
print STDERR "$0: ", scalar @Typos, " typos filtered on port $Port.\n";

# Longer corrections first.
@Typos =
    map { $_-&gt;[1] }
    sort { $b-&gt;[0] &lt;=&gt; $a-&gt;[0] }
    map { [ length($_-&gt;[0]), $_ ] }
    @Typos;

# Spaces are lenient.
$_-&gt;[0] =~ s/ \s+ /\\s+/gx
    foreach @Typos;

# Precompile the correction patterns.
$_-&gt;[0] = qr/ (?&lt;! [&lt;&gt;] ) \b ( $_-&gt;[0] ) \b/x
    foreach @Typos;

#print Dumper $Typos[0], $Typos[-1];

#----------------------------------------------------------

use HTTP::Proxy;
my $proxy = HTTP::Proxy-&gt;new(port =&gt; $Port);
$proxy-&gt;push_body_filter( response =&gt; \&amp;typo_filter );
$proxy-&gt;start();

#----------------------------------------------------------

sub typo_filter
{
    foreach (@Typos)
    {
        ${$_[0]} =~ s|$_-&gt;[0]|$Pre$_-&gt;[1]$Post|g;
    }
}
&lt;/code&gt;
&lt;/readmore&gt;

&lt;p&gt;Without benchmarking, it seems to affect connect times more than it affects actual rendering time, even with 900+ typos in the &lt;code&gt;~/.typo&lt;/code&gt; configuration file.  A sample typo list is &lt;a href="http://www.halley.cc/.typo" &gt;http://www.halley.cc/.typo&lt;/a&gt;; it's just a list of tab-delimited lines:  &lt;code&gt;"definatly\tdefinitely\n"&lt;/code&gt;.  It's set to highlight errors in red on yellow (so you can see it working), but turning that off is a trivial parameter.

&lt;p&gt;--&lt;br&gt;&lt;tt&gt;&amp;#91; e d @ h a l l e y . c c &amp;#93;&lt;/tt&gt;</field>
</data>
</node>
