Do you know where your variables are? PerlMonks

### Filling in missing values in an array

by dhuang90 (Initiate)
 on Jun 28, 2011 at 19:03 UTC Need Help??
dhuang90 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I have been given a large array of numbers, but there are missing values which have a value of NULL. I am to write a program which takes the NULL values and copies the closest value in the array. If a missing value is equidistant between two values, then it will take the average of the two arrays. To illustrate (let underscores represent NULL values: 4 _ _ _ 5 The first NULL would take the value 4, the third NULL takes the value 5, and the middle NULL takes the value 4.5 The code I have written is below, but it doesn't appear to change anything. Any ideas to get me moving? Thanks monks
```\$left  = 1; #measures the distance to the left to the first non-NULL v
+alue
\$right = 1; #measures the distance to the right to the first non-NULL
+value
\$q = 0; #used to iterate through the exposures for the first time
\$p = 0; #used to find a non-NULL value to the left, first iteration
\$k = 0; #used to find a non-NULL value to the right, first iteration
\$m = 0; #used to iterate through the exposures for the second time
\$n = 0; #used to find a non-NULL value to the left, second iteration
\$l = 0; #used to find a non-NULL value to the right, second iteration
\$o = 0; #used to assign values using an average, third iteration

foreach \$exp (@exposures) {
if (\$exp eq "NULL") {
while (\$exposures[\$q - \$p] eq "NULL") {
++\$left;
++\$p;
}
while(\$exposures[\$q + \$k] eq "NULL") {
++\$right;
++\$k;
}
}
if (\$left < \$right) {
\$data{\$q} = "L";
}
if (\$left > \$right) {
\$data{\$q} = "R";
}
\$left = 1;
\$right = 1;
\$p = 0;
\$k = 0;
++\$q;
}

\$end = \$#exposures;
\$number_of_exposures = \$end + 1;

while (\$m <= \$number_of_exposures) {
if (\$data{\$m} eq "L") {
while (\$exposures[\$m - \$n] eq "NULL") {
++\$n;
}
\$exposures[\$m] = \$exposures[4];
}
if    (\$data{\$m} eq "R") {
while (\$exposures[\$m + \$l] eq "NULL"){
++\$l;
}
\$exposures[\$m] = \$exposures[4];
}
\$n = 0;
\$l = 0;
++\$m;
}

foreach \$exp (@exposures) {
if (\$exp eq "NULL") {
\$exp = (\$exposures[\$o - 1] + \$exposures[\$o + 1])/2;
}
++\$o;
}

Replies are listed 'Best First'.
Re: Filling in missing values in an array
by Marshall (Abbot) on Jun 28, 2011 at 20:17 UTC
The solution below uses the flip-flop operator to see if we are in a NULL section or not. If we just started seeing NULL's that array index is noted. When we have seen the first number past the NULL's, a sub is called to go back fill the correct numbers into the array.
```#!/usr/bin/perl -w
use strict;

my \$data = '4 NULL NULL NULL 5 1 NULL NULL 2 NULL 6';

my @data = split(/\s+/,\$data);

my \$iFirstNULL;
my \$cur_index=0;

foreach (@data)
{
if (my \$num = /NULL/.../\d/)
{
\$iFirstNULL = \$cur_index if \$num == 1;
insertValues(\$iFirstNULL-1,\$cur_index,\@data) if \$num =~ /E0/;
}
\$cur_index++;
}

print "@data"; #prints 4 4 4.5 5 5 1 1 2 2 4 6

sub insertValues
{
my (\$istart, \$iend, \$aref) = @_;
my \$pos = \$istart+1;

my \$beginNum = \$aref->[\$istart];
my \$endNum   = \$aref->[\$iend];

while (\$aref->[\$pos] eq 'NULL')
{
if (\$pos-\$istart < \$iend-\$pos)   # closer to start
{
\$aref->[\$pos] = \$beginNum;
}
elsif(\$pos-\$istart > \$iend-\$pos) # closer to end
{
\$aref->[\$pos] = \$endNum;
}
else                             # directly in middle
{
\$aref->[\$pos] = (\$endNum + \$beginNum)/2;
}
\$pos++;
}
}

I was toying with splice and made an amusing variation:

```#!/usr/bin/perl -w
use strict;

my \$data = '4 NULL NULL NULL 5 1 NULL NULL 2 NULL 6';

my @data = split(/\s+/,\$data);

my \$iFirstNULL;
my \$cur_index=0;

foreach (@data)
{
if (my \$num = /NULL/.../\d/)
{
\$iFirstNULL = \$cur_index if \$num == 1;
splice @data, \$iFirstNULL, \$cur_index-\$iFirstNULL,
newVals(@data[\$iFirstNULL-1 .. \$cur_index])
if \$num =~ /EO/;
}
\$cur_index++;
}

print "@data\n"; #prints 4 4 4.5 5 5 1 1 2 2 4 6

sub newVals {
my (\$leftVal, \$rightVal, \$num) = (\$_[0], \$_[-1], scalar(@_)-2);

# Left half of the return list is \$leftVal, right half is \$rightVa
+l
my @ret = ((\$leftVal) x int(\$num/2), (\$rightVal) x int(\$num/2));

# Insert the average in the center for an odd-sized number of null
+s
splice @ret, int(\$num/2), 0, (\$leftVal+\$rightVal)/2 if \$num&1;

return @ret;
}

...roboticus

When your only tool is a hammer, all problems look like your thumb.

Re: Filling in missing values in an array
by zek152 (Pilgrim) on Jun 28, 2011 at 19:43 UTC

Here is a way to do it. I believe that I caught every possible type of way for NULL to appear except for an array with all NULLs. This will break my code.

I find a null and use the index before it as the left index and then find the next non null index and use that as the right index. Then I fill in the blanks using your logic. I have some logic in there to recognize it starting or ending with a chain of NULLs.

Hope this helps.

```@array = ("NULL","NULL",1,3,70,"NULL","NULL","NULL","NULL",
"NULL",50,1,"NULL",4,"NULL","NULL",5,"NULL","NULL","NULL");
\$max_element = @array - 1;
for \$i (0 .. @array-1)
{
if(\$array[\$i] eq "NULL")
{
\$left_i = \$i-1;
\$right_i = \$i;
#find the index of the next non null (but the array might end on a
+ null)
while(\$array[\$right_i] eq "NULL" && \$right_i<=\$max_element)
{
\$right_i++;
}
if(\$right_i > \$max_element)
{
#if the array ends with nulls then just replace them all w/ \$a
+rray[\$left_i]
for \$null_index (\$i .. \$max_element)
{
\$array[\$null_index] = \$array[\$left_i];
}
\$i = \$max_element;
}
elsif(\$left_i < 0)
{
#if the array starts with nulls then just replace all with \$ar
+ray[\$right_i]
for \$null_index (\$i .. \$right_i - 1)
{
\$array[\$null_index] = \$array[\$right_i];
}
\$i=\$right_i;
}
else
{
for \$null_index (\$i .. \$right_i-1)
{
if(\$null_index - \$left_i  == \$right_i - \$null_index)
{
\$array[\$null_index] = (\$array[\$left_i] + \$array[\$right_i])
+/2;
}
if(\$null_index - \$left_i > \$right_i - \$null_index)
{
\$array[\$null_index] = \$array[\$right_i];
}
if(\$null_index - \$left_i < \$right_i - \$null_index)
{
\$array[\$null_index] = \$array[\$left_i];
}
}
\$i=\$right_i;
}
}
}
for \$ar (@array)
{
print \$ar . " ";
}
print "\n";
#OUTPUT
#1 1 1 3 70 70 70 60 50 50 50 1 2.5 4 4 5 5 5 5 5
Re: Filling in missing values in an array
by graff (Chancellor) on Jun 29, 2011 at 01:55 UTC
The code I have written is below, but it doesn't appear to change anything.

Since there's nothing in the OP code snippet about getting input data, and nothing about outputting results, I have to ask: how do you know whether or not it's changing anything?

I'm also curious about the logic in the middle "while" loop. What's special about the \$exposures[4] ? Given the surrounding code (even though it's incomplete) and the description of the problem, it's hard to imagine why one specific element of this array should be used in this way.

Thanks everyone for the replies. It was very helpful in getting me to understand the logic behind this problem. I forgot to edit my code; \$exposures4 was a mistake and the 4 should have been replaced with counter variables I didn't include the code to read in the file or print the output of the file, but I am reading in data from a .txt Once again, big thanks to zek152, Marshall,graff, Neighbour, and FunkyMonk!
Re: Filling in missing values in an array
by FunkyMonk (Chancellor) on Jun 29, 2011 at 13:49 UTC
My take (with tests):

```#!/usr/local/bin/perl

use strict;
use warnings FATAL => 'all';

sub fill_in_the_blanks {
my @arr = @_;

die "No values found" if @arr && 0 == grep \$_ ne '_', @arr;

for (my \$left = 0; \$left < @arr; \$left++) {
next unless \$arr[\$left] eq '_';

my \$right = \$left;
\$right++ while \$right < \$#arr && \$arr[\$right+1] eq '_';

if (\$left == 0) { # NULLs at the left
\$arr[\$_] = \$arr[\$right+1] for \$left..\$right;
}
elsif (\$right == \$#arr) { # NULLs at the right
\$arr[\$_] = \$arr[\$left-1] for \$left..\$right;
}
elsif (\$left == \$right) { # single
\$arr[\$left] = (\$arr[\$left-1] + \$arr[\$left+1]) / 2;
}
else {
\$arr[\$left ] = \$arr[\$left -1];
\$arr[\$right] = \$arr[\$right+1];
}
}

return @arr;
}

use Test::Most;

is_deeply [fill_in_the_blanks(qw< >)], [qw< >], "( )";
is_deeply [fill_in_the_blanks(qw<1>)], [qw<1>], "(1)";

throws_ok { fill_in_the_blanks(qw<  _  >) } qr{No values found}, "(  _
+  )";
throws_ok { fill_in_the_blanks(qw< _ _ >) } qr{No values found}, "( _
+_ )";
throws_ok { fill_in_the_blanks(qw<_ _ _>) } qr{No values found}, "(_ _
+ _)";

is_deeply [fill_in_the_blanks(qw<1 _ 1>)], [qw<1 1 1>], "(1 _ 1)";
is_deeply [fill_in_the_blanks(qw<1 _ 3>)], [qw<1 2 3>], "(1 _ 3)";

is_deeply [fill_in_the_blanks(qw<1  _ _  3>)], [qw<1 1   3 3>], "(1  _
+ _  3)";
is_deeply [fill_in_the_blanks(qw<1 _ _ _ 3>)], [qw<1 1 2 3 3>], "(1 _
+_ _ 3)";

is_deeply [fill_in_the_blanks(qw<1   _>)], [qw<1   1>], "(1   _)";
is_deeply [fill_in_the_blanks(qw<1 _ _>)], [qw<1 1 1>], "(1 _ _)";
is_deeply [fill_in_the_blanks(qw<_   1>)], [qw<1   1>], "(_   1)";
is_deeply [fill_in_the_blanks(qw<_ _ 1>)], [qw<1 1 1>], "(_ _ 1)";

is_deeply [fill_in_the_blanks(qw<4 _ _ _ 5>)], [qw<4 4 4.5 5 5>], "dhu
+ang90";

done_testing;

The results...

```ok 1 - ( )
ok 2 - (1)
ok 3 - (  _  )
ok 4 - ( _ _ )
ok 5 - (_ _ _)
ok 6 - (1 _ 1)
ok 7 - (1 _ 3)
ok 8 - (1  _ _  3)
ok 9 - (1 _ _ _ 3)
ok 10 - (1   _)
ok 11 - (1 _ _)
ok 12 - (_   1)
ok 13 - (_ _ 1)
ok 14 - dhuang90
1..14

An interesting little problem. It has lots of edge cases and so makes a nice code kata. Thank for posting it.

Very informative...this helps me get the hang of tests :)
Though I'm not quite sure how tests 3-5 are supposed to go. If I let the sub execute die "No values found" when that is the case, the app dies instead of showing testresults.
Though I'm not quite sure how tests 3-5 are supposed to go
Neither was I! So I made it die (a.k.a throws an exception) when the array consists only of NULLs. Garbage in, garbage out, as we used to say. Exceptions prevent the garbage leaking out.

If you decide it should do something other than die, just change this line:

```die "No values found" if @arr && 0 == grep \$_ ne '_', @arr;
Re: Filling in missing values in an array
by Neighbour (Friar) on Jun 29, 2011 at 11:21 UTC
```#!/usr/bin/perl

use strict;
use warnings;

my @data = ("NULL", "NULL", 1, 3, 70, "NULL", "NULL", "NULL", "NULL",
+"NULL", 50, 1, "NULL", 4, "NULL", "NULL", 5, "NULL", "NULL", "NULL");

my @data_indices = grep { \$data[\$_] ne "NULL"; } (0 .. \$#data);

if (@data_indices) {
# Phase 1, fill the edges
if (\$data_indices[0] > 0) {
# Left edge
map { \$data[\$_] = \$data[\$data_indices[0]]; } (0 .. \$data_indic
+es[0] - 1);
}
if (\$data_indices[\$#data_indices] < \$#data) {
# Right edge
map { \$data[\$_] = \$data[\$data_indices[\$#data_indices]]; } (\$da
+ta_indices[\$#data_indices] + 1 .. \$#data);
}

# Phase 2, fill all gaps between values
for my \$index (1 .. \$#data_indices) {
my \$difference = \$data_indices[\$index] - \$data_indices[\$index
+- 1];
if (\$difference == 1) { next; }
# fill the first half with the leftmost data-element
map { \$data[\$_] = \$data[\$data_indices[\$index - 1]]; } (\$data_i
+ndices[\$index - 1] + 1 .. \$data_indices[\$index] - \$difference / 2);
+  #.. truncates, so /2 will work.
# ... and the last half with the rightmost data-element
map { \$data[\$_] = \$data[\$data_indices[\$index]]; } (\$data_indic
+es[\$index - 1] + 1 + \$difference / 2 .. \$data_indices[\$index] - 1);
+  #.. truncates, so /2 will work.
if ((\$difference % 2) == 0) {
# This gap does have an equidistant data-element (which wa
+s left-filled, so we'll need to overwrite)
my \$middle_index = \$data_indices[\$index] - \$difference / 2
+;
\$data[\$middle_index] = (\$data[\$middle_index - 1] + \$data[\$
+middle_index + 1]) / 2;
}
} ## end for my \$index (1 .. \$#data_indices)
} ## end if (@data_indices)

print (join (',', @data) . "\n");

This first indexes all non-NULL values, and then fixes the NULL-values step by step.
I'm using map a lot, but you could implement that using foreach-loops if you find map tricky to parse.

Edit: added if (@data_indices) to skip all work if @data contains only "NULL"-values or is empty.
Edit2: That was silly. Replaced @data-to-hash-to-ordered-array with a single grep.

Create A New User
Node Status?
node history
Node Type: perlquestion [id://911838]
Approved by ikegami
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2018-04-19 10:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My travels bear the most uncanny semblance to ...

Results (72 votes). Check out past polls.

Notices?