Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

perl typecasting

by perl_junkie (Acolyte)
on Feb 04, 2008 at 20:32 UTC ( [id://666093]=perlquestion: print w/replies, xml ) Need Help??

perl_junkie has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, This is my first post here. I have benefited a lot from the information on this site and thank all you folks out there for your valuable and timely assistance.

I have been working with perl for 4 weeks now. I have written a script that performs data type validation. It takes in integers/decimals/dates and makes sure that they are within the boundaries of their data types. but now, if an integer field comes in with a special character ($ or #), perl still accepts it as a valid integer value. How can I ensure that if a variable that is defined as an integer comes in with special characters, it is rejected.

Hope I hear from one of you. Thanks..!!!

Replies are listed 'Best First'.
Re: perl typecasting
by toolic (Bishop) on Feb 04, 2008 at 20:42 UTC
    You could try using a regular expression:
    #!/usr/bin/env perl use warnings; use strict; check_int('55'); check_int(0xff); check_int(123); check_int('#123'); check_int('$123'); sub check_int { my $num = shift; if ($num =~ /\D/) { print "Not an integer: $num\n"; } else { print "An integer: $num\n"; } }

    prints out:

    An integer: 55 An integer: 255 An integer: 123 Not an integer: #123 Not an integer: $123

    Can you show some examples of integers with $ and #, along with some code that you've tried?

Re: perl typecasting
by GrandFather (Saint) on Feb 04, 2008 at 21:41 UTC

    Something that will help you is to use strictures (use strict; use warnings;). They pick up errors and questionable code early.

    In the case of Perl being shown a non-integer string in a context where a number is expected use warnings; would tell you that something odd is happening:

    use strict; use warnings; my $notAnInt = '#10'; print "It's 10\n" if $notAnInt == 10; print "It's 0\n" if $notAnInt == 0; print "It's #10\n" if $notAnInt eq '#10';

    prints:

    Argument "#10" isn't numeric in numeric eq (==) at noname1.pl line 6. It's 0 It's #10

    Note that a non-numeric string tends to be numified as 0.


    Perl is environmentally friendly - it saves trees
      Thanks guys..!! This helps..!!!

      But I dont have 'strict' and 'warnings' at my workplace. I tried using it and it says that the module could not be found. This is turning out to be quite a problem...!!

      But, i think I can use the \d matching given above. This should enable me to confirm if all characters are valid numbers.

      Again.. Thanks for your help.. And sorry I didnt post code. I was not aware of the norms of the forums here and also the code slice was kinda large....!!!

        Not having strict and warnings seems quite odd, as I believe they're part of core Perl. As for the code you've been requested to post, it would be the code that is being used to perform the validations, which I would expect to be fairly short: a single regexp to validate integers (untested: $integer =~ /^ *[+-]?[0-9]+$/;, i.e., any number of blanks, an optional sign, and at least one digit before the end-of-string) and a more complex regexp to validate floats (in my other language, I'd just try to read a string as float or as an integer and handle the cases where there's a non-zero error code returned).


        emc

        Information about American English usage here and here.

        Floating point issues? Read this before posting: http://docs.sun.com/source/806-3568/ncg_goldberg.html

        The pragmata strict and warnings are part of the core distribution and have been for many years. Your perl installation may be damaged or your PERL5LIB environment variable may be wrong. Check the output from perl -V.

        In particular, note the version of perl and the values of PERL5LIB and @INC.


        TGI says moo

        You can turn on warnings, at least, from the command line when you invoke perl, e.g.,

        perl -w dubious_script.pl

        You can also turn on warnings from within a script with the statement

        BEGIN { $^W = 1; }

        Without access to the strict module, you will have trouble using many, if not most, of the modules referred to by others. A very strange situation, as has been mentioned; are you sure it's not available?

Re: perl typecasting
by ww (Archbishop) on Feb 04, 2008 at 21:14 UTC
    The information you provide makes it tough to know how to advise. Generally, we can provide better answers and more help when you provide sample code and output (including exact error messages, when relevant. See How do I post a question effectively? (and, perhaps, Writeup Formatting Tips).

    So, on to your question

    If you're using regexen, to accept, for example, only the digital portion of '$123.45' your might want to "validate" by (grossly oversimplified)

    use strict; use warn; my $input = "\$123.45"; if ( $input =~ /\D*(\d{0,3}\.\d{0,2}).*/ ) { my $valid = $1; print $valid; } else { print "not valid\n"; }
    prints: 123.45

    The regex above (sequentially) looks for (and if found, discards) any leading non-digit character(s), then for 0 to 3 digits, a literal dot and 0 to 2 more digits, capturing the digits and the dot.

    For dates, you'd be well-advised to convert the user's input to some standard form, and test that.

    And, genericly, you may wish to read about untainting [ Question about untainting data, Untainted done right! (esp tye's reply] and similar nodes found with search or supersearch} because much of what's discussed will be applicable to your data validation needs.

    BTW: the notion of ...a variable that is defined as an integer.... may be tripping you up if you're accustomed to a strongly typed language. Perl isn't. OTOH, if you mean that YOU want that $var to be an integer, see the likes of numify....

    Update: minor formatting, closed <tt> tag

    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: perl typecasting
by Narveson (Chaplain) on Feb 04, 2008 at 22:52 UTC
    use Scalar::Util 'looks_like_number'; while (<>) { chomp; print ( "$_ ", looks_like_number($_)? 'looks' : 'does not look', " like a number.\n" ); }
    42 42 looks like a number. foo foo does not look like a number.
Re: perl typecasting
by adamk (Chaplain) on Feb 04, 2008 at 23:47 UTC
    If by "integer" you mean "positive integer" then the simplest way might be the _POSINT(...) function in Params::Util.

    If you mean in a stricter sense (allowing zero and/or negatives) then take a look at the bulkier-but-far-more-complete Regexp::Common module.

    It almost certainly has more format testers than you can possibly use.

    http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Regexp/Common.pm
      These are the 2 validation subroutines I have used for integer and decimal data type checks. I am preparing to load the data into DB2 database after this process, so the data types are defined accordingly. I am not too good with regular expressions, so have tried to code to the best of my knowledge using other functions.

      ############# SUB-ROUTINE TO CHECK FOR INTEGER TYPE ############## #check for small int, big int and int data types and return a 0 if cor +rect # else return a 1 sub int_chk { my $input = shift; $input = $input+0; if ($data_type eq 'SMALLINT') { if ($input > -32768 && $input < 32768) { return $positive; } else { return $negative; } } elsif ($data_type eq 'INTEGER') { if ($input > -2147483648 && $input < 2147483647) { return $positive; } else { return $negative; } } elsif ($data_type eq 'BIGINT') { if ($input > -9223372036854770000 && $input < 9223372036854770000) { return $positive; } else { return $negative; } } else { return $negative; } } ############ SUB-ROUTINE TO CHECK FOR DECIMAL TYPE ############## sub decimal_chk { #Here, I get a decimal input and separate it into 2 separate integers # say 123.45 split into 123 and 45 # I need to check if -999 < 123 < 999 and -99 < 45 < 99 . my $data=shift; my $index=index($data,".")+1; $main=substr($data,0,$index-1); #print "Integer part is $main\n"; if ($index > 0) { $prec=substr($data,$index,length($data)-($index)); } else { $prec=''; } $u_len=length($main); $l_len=length($prec); $upper=9; $lower=9; for ($u_count = 1; $u_count < $u_len; $u_count++) { $upper=$upper."9"; } for ($l_count = 1; $l_count < $l_len; $l_count++) { $lower=$lower."9"; } # 123.45 will be declared as decimal(5,2), 2 being the decimal length if ($main > (-1*$upper) && $main < $upper) { if ($prec > (-1*$lower) && $prec < $lower) { if (($u_len <= ($data_precision-$data_dec_len)) && ($l_len <= +$data_dec_len)) # Final length check with specified lengths { return $positive; } else { return $negative; } } else { return $negative; } } else { return $negative; } }

        I carefully read your code and made a few tweaks to it. You are making a good start. Read and reread the docs, ask questions and keep working at it. I strongly recommend that you take the time to understand regexes. They are a very, very, useful and powerful tool.

        I noticed some problems with your code as I read it.

        • You have inconsistent use of inclusive vs exclusive comparisons.
        • Your algorithm for floating point validation would allow numbers like '12. 3' (that's a space between the . and the 3) to pass.
        • There are a number of variables that seem to come from nowhere. Are they globals?
        • You are using C style for loops.

        For loops in perl are usually done as follows:

        # You had for ($l_count = 1; $l_count < $l_len; $l_count++) { $lower=$lower."9"; } # Try this instead: for my $l_count ( 1..$l_len ) { $lower=$lower."9"; } # To build up a string like this you should be using the 'x' operator my $lower = '9' x $l_len;

        Here's your code with my tweaks applied. What follows is untested, but should work.

        use constant MATCH => 1; # Return value for good match use constant FAIL => 0; # Retrun value for failed match use constant MIN => 0; # Index of minimum value for integer checks use constant MAX => 1; # Index of maximum value for integer checks ############# SUB-ROUTINE TO CHECK FOR INTEGER TYPE ############## #check for small int, big int and int data types and return a 0 if cor +rect # else return a 1 sub int_chk { my $input = shift; my $data_type = shift; # Where was this coming from in your code? $input = $input+0; # Test for integerness if ( $input != int( $input ) ) { return FAIL; } # Use a hash lookup to simplify your code. my %check = ( # type MIN + MAX SMALLINT => [ -32768, 32767 ], INTEGER => [ -2147483648, 2147483647 ], BIGINT => [ -9223372036854770000, 9223372036854770000 ], ); # Test for range if ( exists $check{$data_type} ) { my $match = ( $input >= $check{$data_type}[MIN] # I used inclusive range +s here and $input <= $check{$data_type}[MAX] ) ? MATCH : FAIL; return $match; } else { return FAIL; } die "Unreachable code executed"; } sub decimal_chk { my $input = shift; my $whole_places = shift || 3; # Set a default value my $decimal_places = shift || 2; # for each of these args return $input =~ / ^ # Start of input -? # Optional minus sign \d{0,$whole_places} # 1-? whole number digits ( \. # Manadatory decimal point \d{0,$decimal_places} # 1-? decimal places )? # decimal section is optional $ # End of input /x ? MATCH : FAIL ; # x modifier allows comments and whitespac +e }


        TGI says moo

Re: perl typecasting
by jrtayloriv (Pilgrim) on Feb 05, 2008 at 07:08 UTC
    Regexp::Common does this nicely:

    use Regexp::Common; # get your $data ... if ($data ~= /$RE{num}{int}/){ print "It's an integer\n"; }
      TGI, I really appreciate you taking time off to review my code. Thanks..!!!

      I am working on my regex skills now. I should get better at it in the coming weeks.

      The variables I have not defined are globals.. sorry I should have mentioned that... Your code looks much cleaner and easier to debug than mine...!!!! Thanks a lot...!!!!!

      I have got a lot of people telling me about using the modules. Can anyone give me info on how I can check this. Everytime I use this, I get this error message.

      "Can't locate Regexp/Common.pm in @INC (@INC contains: /et/pkgs/perl/5.8.0_crm/lib/perl5 /et/pkgs/perl/5.8.0_crm/lib/perl5/site_perl /et/pkgs/perl/5.8.0_crm/lib/site_perl /usr/perl5/5.00503/sun4-solaris /usr/perl5/5.00503 /usr/perl5/site_perl/5.005/sun4-solaris /usr/perl5/site_perl/5.005 .) at dm.pl line 3."

        Junkie,

        Regular expressions may seem awkward at first, but keep at it! Your efforts will pay off in the long run. Perl is a far more powerful language for having them, and you'll be able to do many more things, easier, for being able to use them.

        The Camel (Programming Perl, Wall/Christiansen/Orwant) has a good chapter on Pattern Matching; you can also use 'perldoc perlre' to get an overview. Neither one is probably ideal for a total regexp newbie, however. I learned RE syntax from years of using sed, awk, and vi, so I can't recommend any specific introductory books (although many exist).

        -dave

        Did your use line look like this?

        use Regexp::Common

        Note the double colon -- not the slash you used in your note.

        ...and did your error end with a line like this?

        BEGIN failed--compilation aborted at 40.pl line 3.

        I mention the first because the first thing that occurs to me is that you tried to

        use Regexp/Common;

        If you used the correct form (as in the first sample above), then IMO, the problem is likely that Regexp::Common is not installed.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://666093]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2024-04-18 06:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found