Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Puzzler - filtering characters

by rashley (Scribe)
on Oct 22, 2007 at 18:40 UTC ( [id://646538]=perlquestion: print w/replies, xml ) Need Help??

rashley has asked for the wisdom of the Perl Monks concerning the following question:

Once upon a time I wrote a very simple sub to filter out non-printable characters on web-form input:
sub filterCharacters { my $text = shift; $text =~ s/[\000-\037]/ /g; $text =~ s/[\177-\777]/ /g; $text =~ s/\s+/ /g; return $text; }
Then one day another developer came along and changed this sub to allow a character he thought was line-feed, but was actually Backspace (he was looking at the decimal value instead of octal):
sub filterCharacters { my $text = shift; $text =~ s/[\000-\009]/ /g; $text =~ s/[\011-\037]/ /g; $text =~ s/[\177-\777]/ /g; $text =~ s/\s+/ /g; return $text; }
So here's the weird part, this manifested itself as each instance of the character '9' getting changed to a space.

We've fixed the problem, but I can't for the life of me figure out how allowing Backspace characters resulted in the 9's getting whacked?

Oh wise Monks, for the sake of my sanity and education, please enlighten me! Thanks.

Replies are listed 'Best First'.
Re: Puzzler - filtering characters
by Joost (Canon) on Oct 22, 2007 at 18:50 UTC
Re: Puzzler - filtering characters
by FunkyMonk (Bishop) on Oct 22, 2007 at 19:31 UTC
    Linefeed is 10 in decimal => 12 in octal, and you can combine three of your substitutes into a single character class:

    $text =~ s/[\000-\011\013-\037\177-\377]/ /g;

    Or, using a POSIX character class, the much more readable

    s/[^[:print:]\n]/ /g;

    See perlre for a full list of the POSIX character classes

Re: Puzzler - filtering characters
by FunkyMonk (Bishop) on Oct 22, 2007 at 18:42 UTC
    9 isn't an octal digit!

      I realize that. So it just used the ASCII value?

        You meant

        ... $text =~ s/[\000-\009]/ /g; ... }

        But it does something like

        ... $text =~ s/[\000-\000]/ /g; # does nothing $text =~ s/[9]/ /g; ...

        BTW, maybe it's sometimes much faster to use the tr/\000-\011/ / operator, but that may depend.

        Regards

        mwa

Re: Puzzler - filtering characters
by andyford (Curate) on Oct 22, 2007 at 18:49 UTC

    I looked up an ASCII code table and it says that 008 & 009 are not codes for anything, so your bad character class was actually hitting the number nine. 'Works' for 008 => 8 too,

    Update: Never mind, should have looked up 'octal'.

    non-Perl: Andy Ford

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://646538]
Approved by Joost
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (2)
As of 2026-01-17 15:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your view on AI coding assistants?





    Results (121 votes). Check out past polls.

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.