in reply to Re^5: UTF8 versus \w in pattern matching (basic test)
in thread UTF8 versus \w in pattern matching

That's not strange. You're seeing Unicode codepoints, which for the characters in question happen to be identical to their ISO-8859-1 encodings. Add "\N{EURO SIGN}" to the string and you get "\x{20ac}": That's again the codepoint and no UTF-8 encoding.

"Everything is UTF-8" is one of the most frequent false assumptions I encounter when dealing with non-ASCII characters.

  • Comment on Re^6: UTF8 versus \w in pattern matching (basic test)

Replies are listed 'Best First'.
Re^7: UTF8 versus \w in pattern matching (basic test)
by jo37 (Hermit) on Jul 06, 2021 at 18:03 UTC

    Thanks for the clarification.

    Greetings,
    -jo

    $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$