garyboyd has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way in perl to impute missing data?

I have a 2-D array where missing data is encoded by 9, and the real data exists in a state of either 0,1 or 2. If I export only unique rows in the array I get 8 rows however two of these rows contain missing data and imputation of one of the other three values means that only 6 rows should be displayed

eg

2 2 0 1 0 2 2 0 2 1 0 2 1 1 0 0 1 1 1 0 2 2 0 0 0 2 9 0 2 1 0 2 0 1 0 2 2 0 9 0

The row's containing 2 9 0 2 1 and 2 2 0 9 0 should be removed. Is there a perl module that can do this easily ?

Replies are listed 'Best First'.
Re: Imputation of data
by choroba (Archbishop) on Jan 25, 2016 at 15:41 UTC
    So, you just want to grep lines that don't contain 9?
    #! /usr/bin/perl use warnings; use strict; use Data::Dumper; use List::Util qw{ none }; my @rows; while (<DATA>) { my @row = split ' '; push @rows, \@row; } print 'before: ', Dumper(\@rows); @rows = grep { none { '9' eq $_ } @$_ } @rows; print 'after: ', Dumper(\@rows); __DATA__ 2 2 0 1 0 2 2 0 2 1 0 2 1 1 0 0 1 1 1 0 2 2 0 0 0 2 9 0 2 1 0 2 0 1 0 2 2 0 9 0
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
Re: Imputation of data
by Corion (Pope) on Jan 25, 2016 at 15:28 UTC

    You don't even need a module. Just use grep to remove the rows/arrayrefs with missing data.

Re: Imputation of data
by CountZero (Bishop) on Jan 25, 2016 at 17:24 UTC
    Assuming that you only want to delete those rows with "wildcards" that cannot match any other row (with or without wildcards) you cannot just simply delete all rows that contain a "9".

    To check this assumption I have added an extra row of data "2 2 9 2 2". That row cannot match any of the other rows and therefore should not be deleted.

    use Modern::Perl qw/2015/; my %data; # First we transform and load the data into a hash while (<DATA>) { chomp; my $data = $_; s/([^9 ])/[$1X]/g; s/9/[0129]/g; s/X/9/g; $data{$_} = $data; } #Then we check if records with missing data are unique for my $testrecord ( keys %data ) { next unless $data{$testrecord} =~ m/9/; for my $record ( keys %data ) { next if $data{$record} eq $data{$testrecord}; #don't check you +rself if ( $data{$record} =~ m/$testrecord/ ) { delete $data{$testrecord}; last; } } } say $data{$_} for keys %data; __DATA__ 2 2 0 1 0 2 2 0 2 1 0 2 1 1 0 0 1 1 1 0 2 2 0 0 0 2 9 0 2 1 0 2 0 1 0 2 2 0 9 0 2 2 9 2 2
    Output:
    0 1 1 1 0 0 2 0 1 0 2 2 0 1 0 2 2 0 0 0 2 2 0 2 1 0 2 1 1 0 2 2 9 2 2

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

      Thanks for all the replies, unfortunately a simple grep won't work, as CountZero mentions - a row such as "2 2 9 2 2" would be considered unique, and should not be deleted.

      the code provided by CountZero works perfectly, thanks once again for everybodys help!

Re: Imputation of data
by GotToBTru (Prior) on Jan 25, 2016 at 17:33 UTC

    Are you sure of your terminology? I may be wrong, but I understand imputation meaning filling in values for missing data, as opposed to excluding rows with missing data.

    I like to filter data at the source.

    use warnings; use strict; use Data::Dumper; my @rows; while (<DATA>) { my @row = grep { m/[012]/ } split ' '; push @rows, \@row if (scalar @row == 5); } print 'Results: ', Dumper(\@rows); __DATA__ 2 2 0 1 0 2 2 0 2 1 0 2 1 1 0 0 1 1 1 0 2 2 0 0 0 2 9 0 2 1 0 2 0 1 0 2 2 0 9 0
    But God demonstrates His own love toward us, in that while we were yet sinners, Christ died for us. Romans 5:8 (NASB)

Re: Imputation of data
by Discipulus (Abbot) on Jan 25, 2016 at 22:07 UTC
    working at inner array level, something like:

    my $bad = 9; my @filteredAoA = map { $bad ~~ @$_ ? () : [@$_] } @AoA;


    L*
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Imputation of data
by u65 (Chaplain) on Jan 25, 2016 at 17:52 UTC

    Is a unique row determined by the combination or permutation of its members?

    Update: The data the OP shows are unique if they are considered either as combinations or permutations, so I don't think the solutions shown would be correct if uniqueness were to be considered as combinations. Since we don't know the definition of uniqueness, I don't think we can define a solution yet.

Re: Imputation of data
by u65 (Chaplain) on Jan 25, 2016 at 23:10 UTC

    See the update to my original post.