Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Untainting name data from form

by Popcorn Dave (Abbot)
on Sep 24, 2002 at 23:36 UTC ( [id://200507]=perlquestion: print w/replies, xml ) Need Help??

Popcorn Dave has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

I'm in need of a bit of help with this.

I'm trying to work out a regular expression for untainting data passed through a form.

Here's what I have:

#!/usr/bin/perl -w use strict; my $x = 'Ms a smith jr'; print "Name valid\n" if $x =~ m/^(\s+)?(M(s|rs?|iss)\.?)?\s+([A-Z]+)(\ +.)?(\s+)([A-Z]+)(\s+)?([A-Z]+)?$/i;

The part that has me wondering is checking for Mr/Mrs/Ms/Miss. Originally I had something along the lines of: (M(r?s?|iss)) which worked for all the 4 salutation cases, but that matched 'M' by itself as well.

My question is this: is there a way to check for this that is cleaner than the mess I have there now? I think I've taken in to account all (American) salutations ( if anyone does even use those on web forms ) and Jr's, III's, etc...

Thanks in advance!

There is no emoticon for what I'm feeling now.

Replies are listed 'Best First'.
Re: Untainting name data from form
by chromatic (Archbishop) on Sep 24, 2002 at 23:56 UTC

    Would it help if they were in separate form fields? You could have a text entry box for title, first name, middle initial, and last name. Of course, you're also dissallowing hyphenated and apostrophied names.

    I'm am curious why the tainting is even an issue, though. What are you doing with names that makes Perl holler?

Re: Untainting name data from form
by sauoq (Abbot) on Sep 25, 2002 at 00:45 UTC

    Any regex you try to use for this is bound to break. Just wait until Lt. Col. J. Random von Perl-Hacker III Ph.D. visits your site. Or his brother, Rev. Prof. 1st Lt. Jim Bob Q. von Perl-Hacker Sr. LL.D. Ret.

    It's better to either provide a select box input for the prefix and text boxes for first, middle, and last names or to allow freeform entry but use the input as a single piece of information. Don't try to parse it.

    If you just need to untaint the data you could use a much simpler regex that just scrubs potentially unsafe characters like backticks, ampersands, pipes and such.

    -sauoq
    "My two cents aren't worth a dime.";
    
      If you just need to untaint the data you could use a much simpler regex that just scrubs potentially unsafe characters like backticks, ampersands, pipes and such.

      I am fairly strict about adherence to some basic pragmas with regard to the handling of data and taint mode. While I am fairly sure you are aware of this already sauoq, I thought that it would be pertinent to point out for other readers that the regular expression should match only allowed characters and exclude everything else (rather than attempting to match and scrub nasty characters) - This approach provides for a tighter regime for the acceptance of user supplied information and allows your code to catch potentially nasty input down the track which may not have been anticipated or expected when the code was written.

       

Re: Untainting name data from form
by dws (Chancellor) on Sep 25, 2002 at 00:49 UTC
    My question is this: is there a way to check for this that is cleaner than the mess I have there now? I think I've taken in to account all (American) salutations ( if anyone does even use those on web forms ) and Jr's, III's, etc...

    If you're trying to untaint names, you might have better luck adopting a strategy of exclusion. Strip out characters that should never appear in names (e.g., ';<>&'). If you try work up patterns that accept valid names, you're going to face a sequence of suprises. Your pattern, for example, won't match "O'Reilly". That's an easy fix, but then you'll run into last names like "Steele-Stubbings" or "St. Dennis", requiring more punctuation within names.

    Better, I think, to exclude obvious garbage.

Re: Untainting name data from form
by Enlil (Parson) on Sep 25, 2002 at 00:40 UTC
    You could use the \x to make your code more readable.
    use strict; my $x='Mr. Popcorn Dave'; print "Name valid\n" if m/^\s* #ZERO OR MORE SPACES (M(s|rs?|iss)\.?\s+)? #POSS SALUTATION [A-Z]+\.?\s+ #NAME AFTER SAL. [A-Z]+ #2nd NAME \s*([A-Z]+\s*)?$/ix; #SUFFIX
    I have changed a couple of things,but I mainly I think it is clearer if you have \s* instead of (\s+)? and use the \x so that when you look back at this latter you have an idea of what each part of this regex does.
Re: Untainting name data from form
by Popcorn Dave (Abbot) on Sep 25, 2002 at 01:22 UTC
    Thanks to all! Now that I see the answers, I realize I was looking at this completely the wrong way.

    Thanks again!

    p.s. If the good Lt. Col or his brother did visit, I'd be happy. ;)

    There is no emoticon for what I'm feeling now.

Re: Untainting name data from form
by Fastolfe (Vicar) on Sep 25, 2002 at 16:25 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://200507]
Approved by jarich
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2024-04-23 20:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found