Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Best guess for data type

by spiros (Beadle)
on Apr 22, 2013 at 16:28 UTC ( #1029919=perlquestion: print w/ replies, xml ) Need Help??
spiros has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

is there a Perl module out there that given an array of scalars returns a best guess of what data type describes the array. For example, given 100 numbers would guess that its a numerical array while mixed strings and numbers would return that the best data type is string.

I am thinking along the lines of what most statistical applications do when you load a CSV file and they try to guesstimate what data type each column is.

Your wisdom is appreciated..

Comment on Best guess for data type
Re: Best guess for data type
by InfiniteSilence (Curate) on Apr 22, 2013 at 16:51 UTC
    • Develop a set of heuristics - (ex. \d+\.?\d+? or S+, etc.)
    • Apply these to a random sampling of the data
    • Establish a confidence level that the given data are X
    • Proceed under that presumption unless proven wrong in which case modify definition of X to Y

    I suppose there are hundreds of other ways to go about this. The reason I chose the above is that you could have millions of pieces of data to look at and exhaustively looking at each column would be a bit absurd. Besides, you would probably only need to 'catch' an error when trying to perform an activity with a subset like obtaining a standard deviation. In that case you would check each value anyway.

    Celebrate Intellectual Diversity

Re: Best guess for data type
by tobyink (Abbot) on Apr 22, 2013 at 16:55 UTC
    use v5.10; use strict; use warnings; use Types::Standard qw( ArrayRef Str Num Int ); my $arr = [ 1, 2, 3.3 ]; if ($arr ~~ ArrayRef[Int]) { say "ArrayRef of Int" } elsif ($arr ~~ ArrayRef[Num]) { say "ArrayRef of Num" } elsif ($arr ~~ ArrayRef[Str]) { say "ArrayRef of Str" }
    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: Best guess for data type
by LanX (Canon) on Apr 22, 2013 at 22:52 UTC
    > For example, given 100 numbers would guess that its a numerical array while mixed strings and numbers would return that the best data type is string.

    depends on your definition of number, but I think this helps:

    DB<160> @a=("1",0) => (1, 0) DB<161> if (grep { $_+0 ne $_ } @a) { "string" } else { "num" } => "num" DB<162> @a=("1 ",0,"") => ("1 ", 0, "") DB<163> if (grep { $_+0 ne $_ } @a) { "string" } else { "num" } => "string"

    > I am thinking along the lines of what most statistical applications do when you load a CSV file and they try to guesstimate what data type each column is.

    scalar grep returns a number, your free to check for thresholds.

    DB<166> @a=("1 ",0,"") => ("1 ", 0, "") DB<167> scalar grep { $_+0 ne $_ } @a => 2 DB<168> (grep { $_+0 ne $_ } @a) > @a/2 => 1

    update

    you could also use Scalar::Util which is core

    DB<186> use Scalar::Util qw/looks_like_number/ => 0 DB<187> grep { ! looks_like_number($_) } 0,"1","2 ", 3.14, " 4 ", 5e +55, "six" => "six"

    please not that now strings like " 4 " are considered numbers!

    As I said depends a lot on your definition... =)

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1029919]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2014-12-22 22:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (132 votes), past polls