Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

sed character codes

by kettle (Beadle)
on Mar 30, 2006 at 01:27 UTC ( #540089=perlquestion: print w/replies, xml ) Need Help??

kettle has asked for the wisdom of the Perl Monks concerning the following question:

I want to regexp a UTF character using the code.
When doing something similar with unicode characters this works fine:
s/\x{00C0}//g; # delete the upside-down, sentence # initial question mark used in Spanish

However, if I do something similar with a corresponding UTF-8 code i.e.,
s/\x{C0BF}//g; # also tried \x{C0 BF} which resulted, as expected, # in an 'illegal hexidecimal digit...' error message

nothing happens. How can I regexp these codes? Thanks! **that UTF8 code should be C2 BF...

Replies are listed 'Best First'.
Re: sed character codes
by ikegami (Pope) on Mar 30, 2006 at 01:35 UTC

    If you wish to search/replace for a UTF-8 sequence, you'll need a string in UTF-8 format. Encode is the module to use to convert the string to UTF-8. Then, you can search for the bytes using /\xC0\xBF/.

    Of course, if the string was read in as ASCII or another single-byte encoding, it should already be in UTF-8, so you should be able to use /\xC0\xBF/ already.

    At least, that's how I understand things. I don't have much experience in this area.

      thanks! that was exactly what I was looking for. I just needed the formatting convention, which appears to be:

      \x[A-Z0-9]{2}\x[A-Z0-9]{2}

      More generally, do you (or does anyone else) happen to know where I could find this information for other character encodings?? joe

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://540089]
Approved by Moriarty
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (2)
As of 2021-12-04 15:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (30 votes). Check out past polls.

    Notices?