|
|
| XP is just a number | |
| PerlMonks |
Removing Non-Ascii chars from text fileby MonkPaul (Friar) |
| on Jun 07, 2007 at 12:09 UTC ( #619792=perlquestion: print w/ replies, xml ) | Need Help?? |
|
MonkPaul has asked for the
wisdom of the Perl Monks concerning the following question:
Hello,
I have a text file that contains text stripped from a PDF document. This text contains non-ascii characters that I have to remove before I can run it through some text-mining software. I have looked at the ord function to remove the ascii values that are not in the basic ascii table, but I am not sure how to use this over the whole text file. I thought of parsing each line, then looking at each letter/non-letter in turn. I have also looked at the previous searches on text cleaning but these are just for stripping out letters and desired content - not non-ascii. Does anybody have any recomendations for removing these chars?
many thanks,
Back to
Seekers of Perl Wisdom
|
|
||||||||||||||||||||||||||||||||||||||