Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Accent-insensitive case conversion

by echo (Pilgrim)
on Aug 22, 2001 at 17:20 UTC ( #106938=snippet: print w/replies, xml ) Need Help??
Description: In French (and other european languages) we often need to make accent-insensitive comparisons. For example, users of a web site will type andromede in a search box and expect that to match andromde.

This code creates accent-insensitive lc and uc functions. They work like the builtins but get rid of any accents.

Usage:

print iso_8859_1_lc("andromde"); # prints "andromede"; print iso_8859_1_uc("andromde"); # prints "ANDROMEDE";
use strict;
my %iso_8859_1_accents = (
    a => [ qw(           ) ],
    c => [ qw( ) ],
    e => [ qw(       ) ],
    i => [ qw(       ) ],
    n => [ qw( ) ],
    o => [ qw(           ) ],
    u => [ qw(       ) ],
    y => [ qw( ) ],
);
# build translation strings
my (%in, %out);
for my $letter ('a'..'z') {
    my $uletter = CORE::uc $letter;
    # translate non-accented letters
    $in{uc}  .= $letter;
    $out{lc} .= $letter;
    $in{lc}  .= $uletter;
    $out{uc} .= $uletter;
    if (my $ra_accented = $iso_8859_1_accents{$letter}) {
        my $in = join '', @$ra_accented;
        $in{lc} .= $in;
        $in{uc} .= $in;
        $out{lc} .= $letter  x @$ra_accented;
        $out{uc} .= $uletter x @$ra_accented;
    }
}
# build translation subroutines
for my $type (qw(lc uc)) {
    my $sub = qq!
        sub iso_8859_1_$type {
           (my \$s = shift) =~ tr/$in{$type}/$out{$type}/;
           \$s
        }
    !;
    eval $sub;
}
Replies are listed 'Best First'.
Re: Accent-insensitive case conversion
by Hanamaki (Chaplain) on Aug 22, 2001 at 22:06 UTC
    Nice idea!
    I think writing these characters with the 8th bit set, is a kind of dangerous. E.g some ftp-clients or text editors can do really strange things with these accented characters.
    When I write such kind of programms I will use hex in my hashes. Instead of "" I would use "\xe4" and put "" etc. in a comment on the same line

    Hanamaki
      Nice idea I said ..., but still a little bit longish for stuff you just can do with tr /PUT-IN-ALL-ACCENTED CHARACTERS/PUT-IN-NORMALIZED-FORM/.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: snippet [id://106938]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2022-01-20 03:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    In 2022, my preferred method to securely store passwords is:












    Results (56 votes). Check out past polls.

    Notices?