Efficient way to do field validation

govindkailas has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Efficient way to do field validation by tobyink (Canon) on Jul 31, 2013 at 12:56 UTC
How about something like this... use Type::Params qw(compile); use Types::XSD qw(String Decimal Date Integer); use Text::CSV_XS; use Data::Dumper; my $validator = compile( Integer, Decimal[totalDigits => 8, fractionDigits => 3], String[maxLength => 5], Date->plus_coercions( Integer[totalDigits => 8], q{ substr($_, 0, 4)."-".substr($_, 4, 2)."-".substr($_, 6, 2) +} ), String[maxLength => 8], Decimal[totalDigits => 17, fractionDigits => 3], ); my $csv = 'Text::CSV_XS'->new({ sep_char => '\|' }); while (my $row = $csv->getline(\*DATA)) { my @fields = $validator->(@$row); print Dumper \@fields; } __DATA__ 12\|11.00\|BILL\|20130131\|asd123q\|1234.45 14\|12.0\|MONKEY\|20120228\|gkhkg\|1.2 [download] Produces the following output: `$VAR1 = [ '12', '11.00', 'BILL', '2013-01-01', 'asd123q', '1234.45' ]; Value "MONKEY" did not pass type constraint "String[maxLength=>"5"]" ( +in $_[2]) at validate-csv.pl line 21.` [download] If you've got big files, then you're unlikely to find a faster solution than pairing Text::CSV_XS and Type::Params. `package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name`	[reply] [d/l] [select]
Re: Efficient way to do field validation by ww (Archbishop) on Jul 31, 2013 at 12:42 UTC
What have you tried? We expect you to tell us what you've been doing, so we can help you learn. This is not `code-for-free.com`; if you're merely looking for someone to provide code, you may want to see if they're up. And you definitely have tell us why the first decimal value, "`11.00`" fails your notion of validation -- apparently because the length doesn't match the max digit or decimal digit counts in the spec "`decimal(5,3)`" -- while the name "`BILL`" passes but clearly isn't the max of "`varchar(5)`". OTOH, it looks to me as though you can solve most of the rest of your problem by reading `perldoc -f length` and or `perldoc perlre` with specific reference to quantifiers. My apologies to all those electrons which were inconvenienced by the creation of this post.	[reply] [d/l] [select]
Re^2: Efficient way to do field validation by govindkailas (Acolyte) on Jul 31, 2013 at 14:00 UTC
I am not here for asking how to validate the fields. Neither I am expecting `code-for-free`. As I mentioned in the original post I am selecting and validating each field using appropriate regex. What I am looking for is a better method to do the validation - something similar to c++ class definition. Can we have a hash defined with specific regex keys and check if the value match ?	[reply] [d/l]
Re^3: Efficient way to do field validation by ww (Archbishop) on Jul 31, 2013 at 16:56 UTC
Yes. Update (after keeping my peace long enough to reach a slow burn): What you said about validating in the OP was "*Now how should I validate each fields ?" which I don't read as congruent with "(a)s I mentioned in the original post I am selecting and validating each field using appropriate regex" as you're now asserting. Yes, you stated that you were splitting the record into "columns and taking it to variables." -- again, a statement at some remove from your new version . So yes, I'm taking offense at your reply, as you did at my reply -- an attempt to point out two obvious ways to do some form of validation (and a request that you provide your criteria for determining if an entry is valid). It's not, IMO, a gracious response to an attempt to help with what appeared to be a noob question.... posed in the manner of someone who hasn't read On asking for help and How do I post a question effectively?. (Without your code, it's hard to guess if one can provide a more* efficient was to do field validation.)	[reply]
Re^4: Efficient way to do field validation by govindkailas (Acolyte) on Aug 01, 2013 at 05:25 UTC
Re: Efficient way to do field validation by Laurent_R (Canon) on Jul 31, 2013 at 22:37 UTC
You could build a hash of regexes, something like this (the regexes are just given as quick simplistic examples, I haven't thought very carefully about them). `my %validate = ( INT => qr /^[+-]?\d+$/, DEC => qr /^[+-]?\d\.?\d$/, #etc. );` [download] and then use it to validate your individual fields. You might actually take it one step further and build a dispatch table, something like this: `my %actions = ( INT => sub { return 1 if $_[0] =~ /^[+-]?\d+$/}, DEC => sub { return 1 if $_[0] =~ /^[+-]?\d\.?\d$/}, VARCHAR(5)=> \&validate_varchar_5(@_), #etc. );` [download] This is a very rough untested example, I just want to convey the general idea. You don't say enough about what you have done to figure out whether these techniques will be beneficial.	[reply] [d/l] [select]
Re^2: Efficient way to do field validation by govindkailas (Acolyte) on Aug 01, 2013 at 05:20 UTC
Thanks a lot, I was thinking about something similar like this. This made things clear to me.	[reply]
Re: Efficient way to do field validation by zork42 (Monk) on Jul 31, 2013 at 12:10 UTC
You need to define what you mean by "validate" before anyone can help :) Also some more example input and expected output would help. UPDATE: more detail was added to OP after I wrote this	[reply]
Re: Efficient way to do field validation by sundialsvc4 (Abbot) on Aug 01, 2013 at 01:14 UTC
When faced with tasks like this, I usually write a subroutine which, given a string (or whatever the input may be), is tasked with returning either “falsehood,” or, if any error is encountered, an appropriate error-message string. I normally wrap the entire body of such a function in an `eval {}` block, which will trap any errors that may occur. If an exception is thrown (via `die` or otherwise), the content of that exception string is returned; if not, falsehood. I also often define a `$doing_what` variable that I set to appropriate strings as I run through the subroutine from top to bottom. This value can be used to augment the messages. And then... what can I say... you just go for it. `split()` the string into an array, then check the number of entries in the array: `die()` if the count is wrong. Then, on to the next test. And you simply run through them, one after another after another. Now, one more thing: welcome to the world of Test::More and Test::Exception! You must not assume that your validation routine is, indeed, correct. You need to write a very comprehensive test-suite that throws everything but Lincoln’s Gettysburg Address() at it. This test suite should verify that the routine traps every error that it is supposed to, and that it validates every good string that it is supposed to. This is a complex but vitally important routine, and you need to test it rigorously. () Yes, there’s a story here .. apocryphal or otherwise I don’t know. Legend has it that an early “error-correcting” COBOL(?) compiler, when given a copy of the aforesaid document, “compiled it” with no errors.


Come for the quick hacks, stay for the epiphanies.
	PerlMonks