You need to look at how your MySQL database table is defined, and how the text field contents are actually stored in it. Use some client other than a cgi script to investigate this, if the nature of the storage is not clear to you. For example:
#!/usr/bin/perl
use strict;
use DBI;
use Encode;
# set up connection parameters $dbname, $host, $user, $password, then:
my $dbh = DBI->connect( "DBI::mysql:database=$dbname;host=$host", $use
+r, $password );
my $sth = $dbh->prepare( "select item_name from table1" );
$sth->execute;
my ( $is_wide_utf8, $is_not_utf8, $is_ascii );
while ( my ( $item ) = $sth->fetchrow_array )
{
my $nbytes = length( $item );
my $nchars = eval "length( decode( 'utf8', $item, Encode::FB_CROAK
+ ))";
if ( $@ ) { # decode croaked -- text string is not utf8
$is_not_utf8++;
# figure out something else to try (maybe shiftjis?)
}
elsif ( $nchars == $nbytes ) {
$is_ascii++;
}
else {
$is_wide_utf8++;
}
}
print "$is_ascii entries found to be ascii-only\n" if ( $is_ascii );
print "$is_wide_utf8 entries found to be utf8\n" if ( $is_wide_utf8);
print "$is_not_uft8 entries found to be non-utf8\n" if ( $is_not_utf8
+ );
If the results show that your database contains only ascii characters, it probably means that the numeric character entities (like "ஸ") are being stored in the table (you should print some of the entries to confirm this, and you should probably look into fixing things so that you store actual utf8 characters instead).
Next, in your cgi script, you need to figure out what sort of data is coming from your clients -- numeric entities? actual utf8 characters?
As for doing table searches for unicode strings based on input from a cgi client, the best setup is:
- The database table stores utf8 strings in the relevant text fields
- The cgi script converts the client input to a utf8 string
- The query is prepared using a placeholder in the search condition
- The query is executed with the utf8 string as the parameter value for the placeholder
In other words, the cgi script might do something like this:
my $srchString = $cgi-param( ... );
$srchString =~ s/\&#(\d+);/chr($1)/ge; # convert numeric entities to
+characters
$srchString .= '%' unless $srchString =~ /%$/; # add a wildcard for t
+he search
my $sth = $dbh->prepare( "select item_name,blah from table1 where item
+_name like ?"
$sth->execute( $srchString );
my $rows = $sth->fetchall_arrayref;
for my $row ( @$rows ) {
my ( $item, $blah ) = @$row;
$item = decode( 'utf8', $item ); # make sure perl knows this is u
+tf8 data
# ... do other stuff...
}