Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^3: \b in Unicode regex

by Anonymous Monk
on May 23, 2017 at 09:23 UTC ( [id://1190946]=note: print w/replies, xml ) Need Help??


in reply to Re^2: \b in Unicode regex
in thread \b in Unicode regex

Thanks a lot, Monks.

Knowing that there's no issue wuth \b, I kept investigating. Turned out that one of the strings wasn't really utf8 (for some reason, my terminal insisted on printing it as utf8, though). utf8::decode solved the problem.

Replies are listed 'Best First'.
Re^4: \b in Unicode regex
by ikegami (Patriarch) on May 23, 2017 at 14:08 UTC

    You actually had the opposite problem: You had UTF-8, but the regex engine expects a string of Unicode Code Points[1]. utf8::decode provides the latter from the former.


    1. More specifically, it's \w, \b, \d, etc that are defined in terms of UCP.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1190946]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2025-05-16 16:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.