Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Regex for weird characters

by Anonymous Monk
on Sep 27, 2004 at 14:14 UTC ( #394173=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

What's the regular expression to see of a variable contains funny characters? By funny, I mean foriegn characters with dashes above them and whatever makes those square boxes when people type them in? Anything a-zA-Z0-9 and all the shift+# keys are okay, it's just those other ones that I need to see if they exist.

Replies are listed 'Best First'.
Re: Regex for weird characters
by sgifford (Prior) on Sep 27, 2004 at 15:43 UTC
    If what you want to allow is all printable ASCII characters, you can use regex character classes to specify that a string containing any non-ASCII characters or control characters is "funny":
    print "funny" if ($c =~ /[[:^ascii:][:cntrl:]]/);
    This is equivalent to checking that no character has an integer value less than or equal to 31 (1f hex), or greater than or equal to 127 (7f hex):
    print "funny" if ($c =~ /[\x00-\x1f\x7f-\xff]/s);

    See perlre(1) for more details.

    Both of these solutions disallow newlines and tabs; you can allow them with the [:isspace:] character class.

Re: Regex for weird characters
by DrHyde (Prior) on Sep 27, 2004 at 14:56 UTC
    The easiest way is not to try to look for all the weird characters, but instead to look for only the permitted characters and invert the test. If, for example, you only permit letters, numbers and commas, you would do something like ...
    if($text !~ /^[a-z0-9,]*$/i) { die "bad characters\n"; }
    Doing it this way ensures that you only permit what you want. This is good practice, because you won't accidentally let bad stuff through. If you try to think of all the weird characters that someone can type you're *bound* to forget some of them. For example, you're probably thinking of banning é, è, ñ and ç. But had you remembered æ, œ, ß, ð, ø, å, ł and þ?
      The idea is okay, but why match the whole string if it's sufficiant to fail as soon as one character fails? So instead of
      $text !~ /^[a-z0-9,]*$/i
      this should give the same result:
      $text =~ /[^a-z0-9,]/i


        you could shorten Skeeve's regex further using character classes, acutally it's the same length cause I added spaces ;)

        $text =~ /[^\w\d\s,]/;

        Update: switched /w for \w (++ to graff for pointing out my typo)

        "Cogito cogito ergo cogito sum - I think that I think, therefore I think that I am." Ambrose Bierce

      This goes with what DrHyde was showing with as a bunch of the normal characters you're looking for. It might take more characers depending on what else you want to add, but here you go.
      if($text !~ m/^[a-z0-9,!,@,#,$,%,^,&,*,9,\,,\.\?,\~,:,\+,\-, ,\"]*$/i)

      "Age is nothing more than an inaccurate number bestowed upon us at birth as just another means for others to judge and classify us"

        Why do you have so many commas in your character class?
Re: Regex for weird characters
by Skeeve (Parson) on Sep 27, 2004 at 14:32 UTC
    Just to get you started...
    /[^a-zA-Z0-9_]/ && print "funny things appear in $_\n" for qw/not_funn +y funny1-/;
      /[^\w\x21-\x26\x28-\x2A\x5E\x40]/ && print "funny things appear in $_\n" for qw/not_funny funny1-/;

      But that still doesn't account for things like ? or > or <. I'm taking you too litterally when you say shitf-# keys... :-)

      May the Force be with you

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://394173]
Approved by Happy-the-monk
Front-paged by DrHyde
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2021-10-19 22:34 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (77 votes). Check out past polls.