Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Remove u200b unicode From String

by Corion (Patriarch)
on Jul 25, 2024 at 07:28 UTC ( [id://11160757]=note: print w/replies, xml ) Need Help??


in reply to Remove u200b unicode From String

I think your problem is that it is unclear which encodings your strings have in

  • the database
  • the driver handing the query results to Perl
  • your code
  • the HTML you output

In the end, everything is octets, but Perl regular expressions treat a string only as Unicode if it has been properly decoded.

The main goal to achieve is consistency, and the ideal goal is to Encode::decode the data when you read it (from a file, from the database, ...) and Encode::encode it to UTF-8 when you write it to HTML.

On the way there, you should inspect the octets of the string, for example using Data::Dumper or Data::Dump to see what octets are in the string and also what Perl thinks the string contains. Ideally, Perl should report it sees \x{200b} in the string. If it reports the three bytes \xE2\x80\x8B you have the right data, but Perl does not know that the string should be seen as Unicode. You then should decode it from UTF-8.

You should do this inspection for every step of the pipeline.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11160757]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2024-09-13 18:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The PerlMonks site front end has:





    Results (21 votes). Check out past polls.

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.