Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Best guess for data type

by spiros (Beadle)
on Apr 22, 2013 at 16:28 UTC ( #1029919=perlquestion: print w/ replies, xml ) Need Help??
spiros has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

is there a Perl module out there that given an array of scalars returns a best guess of what data type describes the array. For example, given 100 numbers would guess that its a numerical array while mixed strings and numbers would return that the best data type is string.

I am thinking along the lines of what most statistical applications do when you load a CSV file and they try to guesstimate what data type each column is.

Your wisdom is appreciated..

Comment on Best guess for data type
Re: Best guess for data type
by InfiniteSilence (Curate) on Apr 22, 2013 at 16:51 UTC
    • Develop a set of heuristics - (ex. \d+\.?\d+? or S+, etc.)
    • Apply these to a random sampling of the data
    • Establish a confidence level that the given data are X
    • Proceed under that presumption unless proven wrong in which case modify definition of X to Y

    I suppose there are hundreds of other ways to go about this. The reason I chose the above is that you could have millions of pieces of data to look at and exhaustively looking at each column would be a bit absurd. Besides, you would probably only need to 'catch' an error when trying to perform an activity with a subset like obtaining a standard deviation. In that case you would check each value anyway.

    Celebrate Intellectual Diversity

Re: Best guess for data type
by tobyink (Abbot) on Apr 22, 2013 at 16:55 UTC
    use v5.10; use strict; use warnings; use Types::Standard qw( ArrayRef Str Num Int ); my $arr = [ 1, 2, 3.3 ]; if ($arr ~~ ArrayRef[Int]) { say "ArrayRef of Int" } elsif ($arr ~~ ArrayRef[Num]) { say "ArrayRef of Num" } elsif ($arr ~~ ArrayRef[Str]) { say "ArrayRef of Str" }
    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: Best guess for data type
by LanX (Canon) on Apr 22, 2013 at 22:52 UTC
    > For example, given 100 numbers would guess that its a numerical array while mixed strings and numbers would return that the best data type is string.

    depends on your definition of number, but I think this helps:

    DB<160> @a=("1",0) => (1, 0) DB<161> if (grep { $_+0 ne $_ } @a) { "string" } else { "num" } => "num" DB<162> @a=("1 ",0,"") => ("1 ", 0, "") DB<163> if (grep { $_+0 ne $_ } @a) { "string" } else { "num" } => "string"

    > I am thinking along the lines of what most statistical applications do when you load a CSV file and they try to guesstimate what data type each column is.

    scalar grep returns a number, your free to check for thresholds.

    DB<166> @a=("1 ",0,"") => ("1 ", 0, "") DB<167> scalar grep { $_+0 ne $_ } @a => 2 DB<168> (grep { $_+0 ne $_ } @a) > @a/2 => 1

    update

    you could also use Scalar::Util which is core

    DB<186> use Scalar::Util qw/looks_like_number/ => 0 DB<187> grep { ! looks_like_number($_) } 0,"1","2 ", 3.14, " 4 ", 5e +55, "six" => "six"

    please not that now strings like " 4 " are considered numbers!

    As I said depends a lot on your definition... =)

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1029919]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (15)
As of 2014-09-16 19:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (45 votes), past polls