http://www.perlmonks.org?node_id=11159309

Bod has asked for the wisdom of the Perl Monks concerning the following question:

After a server change, we are getting lots of strange characters from an encoding issue.
Double spaces and emojis are displayed as Â

I think this issue is related to the change from Perl version from 5.16.3 to 5.36.0. From the Perl Delta, I note there have been some changes to the way Perl handles UTF encoding, but I don't understand the implications of this.

We've also upgraded MariaDB from 10.5 to 10.11 but both the character set and the collation are the same. utf8mb4 and utf8mb4_general_ci respectively.

This issue is not just about data that was created prior to the change. Although emojis created after the change are not mutilated, double spaces are.

All web output is UTF8 encoded using:
Content-Type: text/html; charset=UTF-8

Any suggestions where I should look to solve this issue.

Replies are listed 'Best First'.
Re: Encoding issue after upgrade
by ikegami (Patriarch) on May 06, 2024 at 22:20 UTC

    I think this issue is related to the change from Perl version from 5.16.3 to 5.36.0.

    Unlikely.

    Any suggestions where I should look to solve this issue.

    Provide a demonstration, such as a minimal program that exhibits the problem.

      Provide a demonstration, such as a minimal program that exhibits the problem.

      If I knew how to reproduce the problem, I wouldn't be asking for places to look for the problem!

      In changing from one server setup to another, transferring the data from one instance of MariaDB to another via SQL dumpfiles, and running the same Perl scripts albeit under a different version of Perl, we have gone from correctly rendering webpages to webpages containing numerous  characters.

      An example blog post on our test site. The  characters did not appear prior to the change.

        Compare the database data bytewise (old versus new). Log each function's input arguments, compare the logs on the old and new systems. Run each function on the old and new system, compare the returned data (should have been covered by tests).

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

        If I knew how to reproduce the problem

        You didn't say anything about the problem being intermittent. You made it sound like it was the opposite, that you always got the junk characters with previously-inserted data. Is this not the case? That would make the problem reproducible.

        And since it is reproducible, the request is very straight forward. Simply remove everything that's not relevant. You can cut down on huge swaths of code by determining if it's a problem with the data coming from the DB, or if it's a problem with the output.

        If it truly isn't reproducible, then please re-explain the problem more clearly.