how to remove é

ZWcarp has asked for the wisdom of the Perl Monks concerning the following question:

Hello All, I was wondering if there is a way to remove a é symbol (or special character or whatever) from a file using a one liner similar to

perl -i -pe 'tr/\015//d' yourprogram.pl

Which I have used to remove special character ^r. Also is their a list of numbers and their corresponding special characters somewhere so that you can edit any special character you might come across. Sorry if this is a dumb question I'm pretty new. Any help would be greatly appreciated!! Thanks

Comment on how to remove é Download Code

Replies are listed 'Best First'.
Re: Special Characters list by planetscape (Chancellor) on Jun 15, 2011 at 21:39 UTC
See also: The Unicode Sliderule. HTH, planetscape	[reply]
Re^2: Special Characters list by ZWcarp (Beadle) on Jun 15, 2011 at 22:38 UTC
Ok I looked up the code...but the following doesn't seem to be working `perl -i -pe 'tr/\xe9/FIX/d' test.txt` or `perl -i -pe 'tr/\00233/FIX/d' test.txt` What am I doing wrong? Thanks so much for the help you guys are great	[reply] [d/l] [select]
Re^3: Special Characters list by ikegami (Patriarch) on Jun 15, 2011 at 23:06 UTC
`tr/\xe9/FIX/d` is the same as `tr/\xE9/F/` and convert chr(0xE9) to "F". `tr/\00233/FIX/d` is the same as `tr/3\002/IF/` and converts chr(002) to "F" and "3" to "I". What am I doing wrong? What are you trying to do? You previously alluded that you wanted to remove "é" characters that were encoded as E9. `perl -i -pe 'tr/\xE9//d' yourprogram.pl` [download] But the "FIX" in the new code seems to indicate you want something else.	[reply] [d/l] [select]
Re: Special Characters list by ikegami (Patriarch) on Jun 15, 2011 at 21:35 UTC
Wikipedia has charts for many encodings. Your "é" was probably encoded using iso-8859-1, iso-8859-15 or Windows-1252. If the text were to have been decoded, then you'd be dealing with Unicode codepoints. You can find the numbers of Unicode codepoints (in hex) on the Unicode code charts. Another example, Unicode codepoint 20AC: "€": `iso-8859-15 A4 Windows-1252 80 UTF-8 E2 82 AC UTF-16le AC 20 (at an even offset)` [download]	[reply] [d/l]
Re: how to remove é by 7stud (Deacon) on Jun 15, 2011 at 23:11 UTC
You've opened a can of worms. You need to read up on "unicode". The bottom line is that you need to know the "encoding" of any data you read from a file. An "encoding" tells perl how many bytes each integer in your file occupies. Remember computers store characters as integers. Here's an example. Suppose these bytes are in your file: `0000 0001 0000 1000` [download] If you tell perl that your file is encoded in such a way that the first integer occupies 1 byte, then perl will read the following for the first integer: `0000 0001` [download] which is equivalent to 1 in decimal. However, if you tell perl that your file is encoded in such a way that the first integer occupies 2 bytes, then perl will read the following for the first integer: `0000 0001 0000 1000` [download] which is equivalent to 8 + 256 = 264 in decimal. So depending on what encoding you specify, perl will read in a different integer(and again remember that the integers are just codes for characters). By the way, \015 is not the special character ^r (up arrow+r). \015 is the octal syntax for the decimal integer 13, which is the ascii code for a carriage return. The fact that you tried to remove them from a file is very suspect. Please explain why you were doing that.	[reply] [d/l] [select]
Re^2: how to remove é by ZWcarp (Beadle) on Aug 07, 2013 at 20:10 UTC
`perl -i -pe 'tr/\015/\n/d'` Quick fix for getting rid of Excel or other windows markup when reading in a unix or linux environment. this one just changes over the carriage returns as you said.	[reply] [d/l]

Back to Seekers of Perl Wisdom