XP is just a number | |
PerlMonks |
Re: Remove u200b unicode From Stringby Corion (Patriarch) |
on Jul 25, 2024 at 07:28 UTC ( [id://11160757]=note: print w/replies, xml ) | Need Help?? |
I think your problem is that it is unclear which encodings your strings have in
In the end, everything is octets, but Perl regular expressions treat a string only as Unicode if it has been properly decoded. The main goal to achieve is consistency, and the ideal goal is to Encode::decode the data when you read it (from a file, from the database, ...) and Encode::encode it to UTF-8 when you write it to HTML. On the way there, you should inspect the octets of the string, for example using Data::Dumper or Data::Dump to see what octets are in the string and also what Perl thinks the string contains. Ideally, Perl should report it sees \x{200b} in the string. If it reports the three bytes \xE2\x80\x8B you have the right data, but Perl does not know that the string should be seen as Unicode. You then should decode it from UTF-8. You should do this inspection for every step of the pipeline.
In Section
Seekers of Perl Wisdom
|
|