Re: Finding missing elements in a sequence (code)
by kwoff (Friar) on Nov 06, 2001 at 21:04 UTC
|
#!/usr/local/bin/perl -w
use strict;
sub find_holes {
my @list = @{ shift() };
@list = sort { $a <=> $b } @list;
my $low = $list[0];
my $high = $list[-1];
my %isthere = map { $_ => 0 } ($low..$high);
$isthere{$_} = "yes" for @list;
my @vacancies = grep { not $isthere{$_} } sort keys %isthere;
return \@vacancies;
}
my @foo = qw(1 2 5 6 9);
my $vac = find_holes(\@foo);
print "@$vac\n";
__END__
$ ./holes.pl
3 4 7 8
| [reply] [Watch: Dir/Any] [d/l] |
Re: Finding missing elements in a sequence (Rewritten code)
by demerphq (Chancellor) on Nov 06, 2001 at 22:01 UTC
|
Well, I cant see anything obviously wrong with this, except that it uses unobvious methods to accomplish its task and is therefore difficult to understand. If I understand you right this is the opposite of the 'rangification' problem that pops up in various places on a regular basis. In that problem one wishes to take a set of numbers and convert them into a format of begin-end. Your problem (if I understand it correctly) is even easier. You wish to go through a list of numbers to find missing elements in the list. So as I couldnt see what was wrong I decided Id write a quick sub to do the same thing but is a bit more obvious (and less of a memory hog, and presumably faster to boot :-) Note that there is no need for the hash nor the grep.
sub find_holes {
my $list = shift;
my @list = sort { $a <=> $b } @$list;
my $last = $list[0];
my $l = length($last);
my @vacancies;
foreach my $v (@list) {
#If the difference is larger than 1 then theres a hole
if ( $v - $last > 1 ) {
push @vacancies, sprintf( "%0${l}d", $_ ) #store whats mis
+sing
for ( $last + 1 .. $v - 1 );
}
$last = $v; #keep track of the last number we saw
}
return \@vacancies; # return the omitted elements
}
print join "\n", @{find_holes( [ '00001', '00002', '00005', '00006',
'00009', '00010', '00013', '00025' ]
+)};
__END__
(edited) Output:
00003 00004 00007 00008 00011 00012 00014 00015 00016
00017 00018 00019 00020 00021 00022 00023 00024
Apologies if Ive completely missed the point of your routine.
HTH
Yves / DeMerphq
--
Have you registered your Name Space? | [reply] [Watch: Dir/Any] [d/l] |
Re: Finding missing elements in a sequence (code)
by clemburg (Curate) on Nov 06, 2001 at 21:04 UTC
|
sub find_holes {
my @list = @{ shift() };
@list = sort { $a <=> $b } @list;
my $low = $list[0];
my $high = $list[-1];
my %isthere = map { $_ => 0 } ($low..$high);
$isthere{$_} = "yes" for @list;
my @vacancies = grep { not $isthere{$_} } sort keys %isthere;
return \@vacancies;
}
print "@{find_holes([qw(0 1 2 3 4 6 8)])}\n";
Prints:
d:\tmp\try>perl try.pl
perl try.pl
5 7
d:\tmp\try>
Christian Lemburg
Brainbench MVP for Perl
http://www.brainbench.com | [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Finding missing elements in a sequence (code)
by CubicSpline (Friar) on Nov 06, 2001 at 21:07 UTC
|
The problem is with your argument to the find_holes sub. You're returning an array reference, so why not accept an array reference? Try this:
sub find_holes {
my $ref = shift;
@list = @$ref;
@list = sort { $a <=> $b } @list;
my $low = $list[0];
my $high = $list[-1];
my %isthere = map { $_ => 0 } ($low..$high);
$isthere{$_} = "yes" for @list;
my @vacancies = grep { not $isthere{$_} } sort keys %isthere;
return \@vacancies;
}
This worked as far as I could tell. Hope it helps.
~CS
Update: The others are quite right... this does work they way you had it before. *sigh* | [reply] [Watch: Dir/Any] [d/l] |
Re: Finding missing elements in a sequence (code)
by runrig (Abbot) on Nov 06, 2001 at 21:28 UTC
|
Nothing really wrong that I (or anyone else it seems) can
see. Might need to see how you're calling it and what's being passed in, & how you're using what's being passed back.
But if you don't mind some nitpicking: sub find_holes {
# No need to dereference and create
# a whole new array
my $aref = shift;
# This would probably be quicker w/a for loop and
# just saving the max & min, but what the hay...
my ($low, $high) = (sort { $a <=> $b } @$aref)[0,-1];
my %isthere;
# Hash slices are quicker
@isthere{$low..$high} = ();
@isthere{@$aref} = ("yes") x @$aref;
# Grep first, sort last (fewer things to sort that way)
# (and you sorted numerically before, why not now?)
return [
sort { $a <=> $b }
grep { !$isthere{$_} } keys %isthere
];
}
| [reply] [Watch: Dir/Any] [d/l] |
Re: Finding missing elements in a sequence (code)
by deprecated (Priest) on Nov 06, 2001 at 21:12 UTC
|
print join "\n", @{ find_holes( \@issues ) };
and for some reason, it is returning a list of every element. @issues in this case is somewhat like this:
$VAR1 = [
'00001',
'00002',
'00003',
'00004',
'00005',
'00006',
'00007',
'00008',
'00009',
#...
]
for several thousand issues.
--
Laziness, Impatience, Hubris, and Generosity. | [reply] [Watch: Dir/Any] [d/l] [select] |
|
Ah, I see.
Well, you need to make things equal before comparing them ... you are putting numbers 1..n as keys into %isthere, but try to get out $isthere{"00001"}. That will not work - the keys in %isthere are "1" etc.
This is what you need to make it work:
$isthere{$_} = "yes" for map {$_+0} @list;
Christian Lemburg
Brainbench MVP for Perl
http://www.brainbench.com | [reply] [Watch: Dir/Any] [d/l] [select] |
|
Or else you might change what you put into the hash:
my %isthere = map { sprintf ('%05d',$_) => 0 } ($low..$high);
Assuming that all numbers you want to check are in the same format...
pike
| [reply] [Watch: Dir/Any] [d/l] |
|
Wonderful! This works. Can you please go into a little more detail as to how this works? I am not sure how:
"00010" + 0
DWIM's.
brother dep
--
Laziness, Impatience, Hubris, and Generosity. | [reply] [Watch: Dir/Any] [d/l] |
|
|
Here's a slightly more in-depth analysis of the problem in the
subroutine (for those that may be interested):
We already know Perl can use a string that looks like a number as a
number. Each SV (perl's internal representation of scalar value) has
various slots to hold different kinds of data (and various flags to
indicate which slots are currently valid) --- we will just
simplify this to two slots, one for Strings and one for Numbers, for
this discussion ...
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
Are all the numbers zero padded to the same length?? Then see my answer below but use the default sort instead of the
numeric sort in both places. The first numeric sort is 'numerifying' the strings and stripping the leading zeroes which messes up the rest of the routine.
Update: Ok, its not necessarily the first sort stripping the zeroes (in my test case, I was doing a numeric sort before I called the sub, so that's what I was seeing), but it does cause the subsequent statements to treat the list as numbers instead of character
strings. So you should treat them as either numbers or characters throughout the sub (like using the +0 trick to begin with), but not both.
| [reply] [Watch: Dir/Any] |
|
The first numeric sort is 'numerifying' the strings and stripping the leading zeroes which messes up the rest of the routine.
I am not if that is right, at least not in the simple sense of what you say. See:
sub find_holes {
my @list = @{ shift() };
@list = sort { $a <=> $b } @list;
my $low = $list[0];
my $high = $list[-1];
my %isthere = map { $_ => 0 } ($low..$high);
print "@{[sort keys %isthere]}\n\n";
print "@list\n\n";
$isthere{$_} = "yes" for map {$_+0} @list;
my @vacancies = grep { not $isthere{$_} } sort keys %isthere;
return \@vacancies;
}
my @issues = @{ [
'00001',
'00002',
'00003',
'00004',
# '00005',
'00006',
'00007',
'00008',
'00009',
#...
] };
print join("\n", @{ find_holes( \@issues ) });
Prints:
d:\tmp\try>perl try.pl
perl try.pl
1 2 3 4 5 6 7 8 9
00001 00002 00003 00004 00006 00007 00008 00009
5
What the sort probably does is to fill the number entry in the glob of the entries of @list. But why does the ".." operator use the number slot, while the hash access code and print use the string slot?
Christian Lemburg
Brainbench MVP for Perl
http://www.brainbench.com | [reply] [Watch: Dir/Any] [d/l] [select] |
|
My gut feeling is that depending on having the input file correctly padded is a bad call, especially if there are humans involved in the process. Better to pad the output as is appropriate.
OTOH if the data is _guaranteed_ to be padded correctly then your point is good and I would utilize it. Anything that minimizes the amount of code required is a Good Thing in my book.
Just my $0.02 :-)
Yves / DeMerphq
--
Have you registered your Name Space?
| [reply] [Watch: Dir/Any] |
Re: Finding missing elements in a sequence (code)
by gt8073a (Hermit) on Nov 06, 2001 at 21:29 UTC
|
print join "\n", @{ find_holes( \@issues ) };
how are you populating @issues? if you are reading them in from a file, are you removing new lines?
Will perl for money
JJ Knitis
(901) 756-7693
gt8073a@industrialmusic.com | [reply] [Watch: Dir/Any] |
Use array instead of hash if all 'keys' are numeric
by pike (Monk) on Nov 07, 2001 at 14:30 UTC
|
I have thought again about this problem, and it seems to me that you could have avoided this problem by using an array instead of a hash in your sub - this will get you the correct (numeric) comparison and save you one sort:
sub find_holes {
my @list = @{ shift() };
@list = sort { $a <=> $b } @list;
my $low = $list[0];
my $high = $list[-1];
my @notfound;
@notfound[$low..$high] = ($low..$high);
$notfound[$_] = 0 for @list;
my @vacancies = grep {$_} @notfound;
return \@vacancies;
}
Since the value of the array index is not accessible in a simple for loop, I set the values of the array equal to the index and inverted the sense of the final grep, but otherwise everything is as it was.
Note that this code will fail (just like the original one) if 0 is contained in the interval $low .. $high. But this can easily be fixed by setting the values we found to undef rather than 0 and checking for defined values in the final grep...
pike
| [reply] [Watch: Dir/Any] [d/l] [select] |