Re: replacing special characters in file

by polypompholyx (Chaplain)
on Aug 05, 2005 at 10:38 UTC

in reply to replacing special characters in file

It depends on exactly what you want to do, but if you just want to strip out certain characters, you can do:

s/[^\w]//g; # strip everything but 'word' characters

s/[^[:ascii:]]//g; # strip everything but ASCII characters

If you want to specifically substitute certain character (sequences), you can do this using hex escapes in the regex, if you can't type them directly in your text editor:

s/\x{00A1}\x{00DC}/st/g; # replace upside-down-bang capital-u-umlaut with 'st'.

You can look up the (Unicode) hex values for capital-u-umlaut and friends in Unibook.

Bear in mind that the text you are editing may not be encoded in Unicode, and that even if it is, some characters may display differently in a terminal (particularly a DOS box) compared to how they will in a text file. Welcome to the inconsistent mess of character encoding standards.

Re^2: replacing special characters in file
on Jun 01, 2012 at 18:08 UTC

    I have no idea if perlmonks prefers replying to the original thread or starting a new one if a certain amount of time has passed since the last reply, but I guess I'll find out soon enough. I too had odd characters like and , that were removed flawlessly, thanks to...


    ... but it also removed any * in the file as well, which is an ascii value. Any reason why?

Node Type: note
