http://www.perlmonks.org?node_id=123624


in reply to Finding missing elements in a sequence (code)

I am calling it thusly:
print join "\n", @{ find_holes( \@issues ) };
and for some reason, it is returning a list of every element. @issues in this case is somewhat like this:
$VAR1 = [ '00001', '00002', '00003', '00004', '00005', '00006', '00007', '00008', '00009', #... ]
for several thousand issues.

--
Laziness, Impatience, Hubris, and Generosity.

Replies are listed 'Best First'.
Re: Re: Finding missing elements in a sequence (code)
by clemburg (Curate) on Nov 06, 2001 at 21:32 UTC

    Ah, I see.

    Well, you need to make things equal before comparing them ... you are putting numbers 1..n as keys into %isthere, but try to get out $isthere{"00001"}. That will not work - the keys in %isthere are "1" etc.

    This is what you need to make it work:

    $isthere{$_} = "yes" for map {$_+0} @list;

    Christian Lemburg
    Brainbench MVP for Perl
    http://www.brainbench.com

      Or else you might change what you put into the hash:

      my %isthere = map { sprintf ('%05d',$_) => 0 } ($low..$high);
      Assuming that all numbers you want to check are in the same format...

      pike

      Wonderful! This works. Can you please go into a little more detail as to how this works? I am not sure how:
      "00010" + 0
      DWIM's.

      brother dep

      --
      Laziness, Impatience, Hubris, and Generosity.

        Adding 0 to the string "00001" forces the string to be interpreted as a number.

        In your code, when you say:

        my %isthere = map { $_ => 0 } ($low..$high);

        You force "00001" to be treated as a number, too. What *is* interesting is *why* "00001" is treated as a number, since there is also the possibility of doing "aaa" ... "zzz" and having it work, too. So why does "00001" end up being treated as "1" by the ".." operator? Probably the perl interpreter looks at your $low and $high and decides they look like numbers.

        BTW, you should not do this $low .. $high stuff - it can hang you badly if $low and $high end up to be interpreted as strings.

        At least use int($low) .. int($high), to force $low and $high to be interpreted as numbers.

        Christian Lemburg
        Brainbench MVP for Perl
        http://www.brainbench.com

Re: Re: Finding missing elements in a sequence (code)
by danger (Priest) on Nov 06, 2001 at 22:49 UTC

    Here's a slightly more in-depth analysis of the problem in the subroutine (for those that may be interested):

    We already know Perl can use a string that looks like a number as a number. Each SV (perl's internal representation of scalar value) has various slots to hold different kinds of data (and various flags to indicate which slots are currently valid) --- we will just simplify this to two slots, one for Strings and one for Numbers, for this discussion ...

Re: Re: Finding missing elements in a sequence (code)
by runrig (Abbot) on Nov 06, 2001 at 21:37 UTC
    Are all the numbers zero padded to the same length?? Then see my answer below but use the default sort instead of the numeric sort in both places. The first numeric sort is 'numerifying' the strings and stripping the leading zeroes which messes up the rest of the routine.

    Update: Ok, its not necessarily the first sort stripping the zeroes (in my test case, I was doing a numeric sort before I called the sub, so that's what I was seeing), but it does cause the subsequent statements to treat the list as numbers instead of character strings. So you should treat them as either numbers or characters throughout the sub (like using the +0 trick to begin with), but not both.

      The first numeric sort is 'numerifying' the strings and stripping the leading zeroes which messes up the rest of the routine.

      I am not if that is right, at least not in the simple sense of what you say. See:

      sub find_holes { my @list = @{ shift() }; @list = sort { $a <=> $b } @list; my $low = $list[0]; my $high = $list[-1]; my %isthere = map { $_ => 0 } ($low..$high); print "@{[sort keys %isthere]}\n\n"; print "@list\n\n"; $isthere{$_} = "yes" for map {$_+0} @list; my @vacancies = grep { not $isthere{$_} } sort keys %isthere; return \@vacancies; } my @issues = @{ [ '00001', '00002', '00003', '00004', # '00005', '00006', '00007', '00008', '00009', #... ] }; print join("\n", @{ find_holes( \@issues ) });

      Prints:

      d:\tmp\try>perl try.pl perl try.pl 1 2 3 4 5 6 7 8 9 00001 00002 00003 00004 00006 00007 00008 00009 5

      What the sort probably does is to fill the number entry in the glob of the entries of @list. But why does the ".." operator use the number slot, while the hash access code and print use the string slot?

      Christian Lemburg
      Brainbench MVP for Perl
      http://www.brainbench.com

      My gut feeling is that depending on having the input file correctly padded is a bad call, especially if there are humans involved in the process. Better to pad the output as is appropriate.

      OTOH if the data is _guaranteed_ to be padded correctly then your point is good and I would utilize it. Anything that minimizes the amount of code required is a Good Thing in my book.

      Just my $0.02 :-)

      Yves / DeMerphq
      --
      Have you registered your Name Space?