Accent-insensitive case conversion

by echo (Pilgrim)
on Aug 22, 2001 at 17:20 UTC
Description: In French (and other european languages) we often need to make accent-insensitive comparisons. For example, users of a web site will type andromede in a search box and expect that to match andromde.

This code creates accent-insensitive lc and uc functions. They work like the builtins but get rid of any accents.


print iso_8859_1_lc("andromde"); # prints "andromede"; print iso_8859_1_uc("andromde"); # prints "ANDROMEDE";
use strict;
my %iso_8859_1_accents = (
    a => [ qw(           ) ],
    c => [ qw( ) ],
    e => [ qw(       ) ],
    i => [ qw(       ) ],
    n => [ qw( ) ],
    o => [ qw(           ) ],
    u => [ qw(       ) ],
    y => [ qw( ) ],
# build translation strings
my (%in, %out);
for my $letter ('a'..'z') {
    my $uletter = CORE::uc $letter;
    # translate non-accented letters
    $in{uc}  .= $letter;
    $out{lc} .= $letter;
    $in{lc}  .= $uletter;
    $out{uc} .= $uletter;
    if (my $ra_accented = $iso_8859_1_accents{$letter}) {
        my $in = join '', @$ra_accented;
        $in{lc} .= $in;
        $in{uc} .= $in;
        $out{lc} .= $letter  x @$ra_accented;
        $out{uc} .= $uletter x @$ra_accented;
# build translation subroutines
for my $type (qw(lc uc)) {
    my $sub = qq!
        sub iso_8859_1_$type {
           (my \$s = shift) =~ tr/$in{$type}/$out{$type}/;
    eval $sub;
Re: Accent-insensitive case conversion
by Hanameki (Chaplain) on Aug 22, 2001 at 22:06 UTC
    Nice idea!
    I think writing these characters with the 8th bit set, is a kind of dangerous. E.g some ftp-clients or text editors can do really strange things with these accented characters.
    When I write such kind of programms I will use hex in my hashes. Instead of "" I would use "\xe4" and put "" etc. in a comment on the same line

      Nice idea I said ..., but still a little bit longish for stuff you just can do with tr /PUT-IN-ALL-ACCENTED CHARACTERS/PUT-IN-NORMALIZED-FORM/.

Node Type: snippet
