MonkPaul has asked for the wisdom of the Perl Monks concerning the following question:


I have a text file that contains text stripped from a PDF document. This text contains non-ascii characters that I have to remove before I can run it through some text-mining software.

I have looked at the ord function to remove the ascii values that are not in the basic ascii table, but I am not sure how to use this over the whole text file. I thought of parsing each line, then looking at each letter/non-letter in turn. I have also looked at the previous searches on text cleaning but these are just for stripping out letters and desired content - not non-ascii.

Does anybody have any recomendations for removing these chars?

many thanks,