onegative has asked for the wisdom of the Perl Monks concerning the following question:

Good day Monks,

I have a problem and not sure how to handle. Seems like I am finding ASCII characters embedded within text strings, mostly likely introduced during cut/paste into application fields during configurations. These are unseen within the application fields and typically show up as a space and not easily identified visually.

The problem is that the data once retrieved from the configurations are being passed into my code and when I build the xml (even CDATA) it is cratering the XML parsers which later process the file.

My guestion is how would I globally detect that an ASCII character is embedded in the string and either remove or convert accordingly. Not knowing what may be there is the challenge for me...and how to address it through some function to eliminate or convert.

Any ideas or best practice would be GREATLY appreciated.

Thanks,
Danny
  • Comment on Detecting ASCII Characters embedded within text string

Replies are listed 'Best First'.
Re: Detecting ASCII Characters embedded within text string
by Anonyrnous Monk (Hermit) on Jan 28, 2011 at 16:32 UTC

    You probably don't really mean ASCII when you say ASCII, but rather "control characters", or some such.

    As for replacing certain characters, or ranges of characters, see tr, or s  (update: here are maybe more useful links, as the rather large section "Quote and Quote-like Operators" that the respective entries in perlfunc refer you to, might be somewhat distracting for the uninitiated: tr, s).

    For detecting them, maybe something like this  (the set [^\x20-\x7e] denotes characters not in the range hex 20-7e (decimal 32-126) ):

    my $s = "foo \x03 bar \x05 baz"; printf "detected strange char: 0x%x\n", ord for $s =~ /[^\x20-\x7e]/g +; __END__ detected strange char: 0x3 detected strange char: 0x5