Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Finding missing elements in a sequence (code)

by deprecated (Priest)
on Nov 06, 2001 at 20:46 UTC ( [id://123615]=perlquestion: print w/replies, xml ) Need Help??

deprecated has asked for the wisdom of the Perl Monks concerning the following question:

sub find_holes { my @list = @{ shift() }; @list = sort { $a <=> $b } @list; my $low = $list[0]; my $high = $list[-1]; my %isthere = map { $_ => 0 } ($low..$high); $isthere{$_} = "yes" for @list; my @vacancies = grep { not $isthere{$_} } sort keys %isthere; return \@vacancies; }
I'm using the above sub to find holes in a sequence. It worked before I turned it into a sub, and now it seems to not behave as I planned. But I cant seem for the life of me to figure out what is wrong with it.

Can anyone see any glaring problems?

thanks,
brother dep

--
Laziness, Impatience, Hubris, and Generosity.

Replies are listed 'Best First'.
Re: Finding missing elements in a sequence (code)
by kwoff (Friar) on Nov 06, 2001 at 21:04 UTC
    Hmm, what are you expecting and how are you calling your function?

    It seems to "work" for me.

    #!/usr/local/bin/perl -w use strict; sub find_holes { my @list = @{ shift() }; @list = sort { $a <=> $b } @list; my $low = $list[0]; my $high = $list[-1]; my %isthere = map { $_ => 0 } ($low..$high); $isthere{$_} = "yes" for @list; my @vacancies = grep { not $isthere{$_} } sort keys %isthere; return \@vacancies; } my @foo = qw(1 2 5 6 9); my $vac = find_holes(\@foo); print "@$vac\n"; __END__ $ ./holes.pl 3 4 7 8
Re: Finding missing elements in a sequence (Rewritten code)
by demerphq (Chancellor) on Nov 06, 2001 at 22:01 UTC
    Well, I cant see anything obviously wrong with this, except that it uses unobvious methods to accomplish its task and is therefore difficult to understand. If I understand you right this is the opposite of the 'rangification' problem that pops up in various places on a regular basis. In that problem one wishes to take a set of numbers and convert them into a format of begin-end. Your problem (if I understand it correctly) is even easier. You wish to go through a list of numbers to find missing elements in the list. So as I couldnt see what was wrong I decided Id write a quick sub to do the same thing but is a bit more obvious (and less of a memory hog, and presumably faster to boot :-) Note that there is no need for the hash nor the grep.

    sub find_holes { my $list = shift; my @list = sort { $a <=> $b } @$list; my $last = $list[0]; my $l = length($last); my @vacancies; foreach my $v (@list) { #If the difference is larger than 1 then theres a hole if ( $v - $last > 1 ) { push @vacancies, sprintf( "%0${l}d", $_ ) #store whats mis +sing for ( $last + 1 .. $v - 1 ); } $last = $v; #keep track of the last number we saw } return \@vacancies; # return the omitted elements } print join "\n", @{find_holes( [ '00001', '00002', '00005', '00006', '00009', '00010', '00013', '00025' ] +)}; __END__ (edited) Output: 00003 00004 00007 00008 00011 00012 00014 00015 00016 00017 00018 00019 00020 00021 00022 00023 00024
    Apologies if Ive completely missed the point of your routine.

    HTH

    Yves / DeMerphq
    --
    Have you registered your Name Space?

Re: Finding missing elements in a sequence (code)
by clemburg (Curate) on Nov 06, 2001 at 21:04 UTC

    Huh? How do you call it? This runs fine on my machine:

    sub find_holes { my @list = @{ shift() }; @list = sort { $a <=> $b } @list; my $low = $list[0]; my $high = $list[-1]; my %isthere = map { $_ => 0 } ($low..$high); $isthere{$_} = "yes" for @list; my @vacancies = grep { not $isthere{$_} } sort keys %isthere; return \@vacancies; } print "@{find_holes([qw(0 1 2 3 4 6 8)])}\n";

    Prints:

    d:\tmp\try>perl try.pl perl try.pl 5 7 d:\tmp\try>

    Christian Lemburg
    Brainbench MVP for Perl
    http://www.brainbench.com

Re: Finding missing elements in a sequence (code)
by CubicSpline (Friar) on Nov 06, 2001 at 21:07 UTC
    The problem is with your argument to the find_holes sub. You're returning an array reference, so why not accept an array reference? Try this:

    sub find_holes { my $ref = shift; @list = @$ref; @list = sort { $a <=> $b } @list; my $low = $list[0]; my $high = $list[-1]; my %isthere = map { $_ => 0 } ($low..$high); $isthere{$_} = "yes" for @list; my @vacancies = grep { not $isthere{$_} } sort keys %isthere; return \@vacancies; }

    This worked as far as I could tell. Hope it helps.

    ~CS

    Update: The others are quite right... this does work they way you had it before. *sigh*

Re: Finding missing elements in a sequence (code)
by runrig (Abbot) on Nov 06, 2001 at 21:28 UTC
    Nothing really wrong that I (or anyone else it seems) can see. Might need to see how you're calling it and what's being passed in, & how you're using what's being passed back. But if you don't mind some nitpicking:
    sub find_holes { # No need to dereference and create # a whole new array my $aref = shift; # This would probably be quicker w/a for loop and # just saving the max & min, but what the hay... my ($low, $high) = (sort { $a <=> $b } @$aref)[0,-1]; my %isthere; # Hash slices are quicker @isthere{$low..$high} = (); @isthere{@$aref} = ("yes") x @$aref; # Grep first, sort last (fewer things to sort that way) # (and you sorted numerically before, why not now?) return [ sort { $a <=> $b } grep { !$isthere{$_} } keys %isthere ]; }
Re: Finding missing elements in a sequence (code)
by deprecated (Priest) on Nov 06, 2001 at 21:12 UTC
    I am calling it thusly:
    print join "\n", @{ find_holes( \@issues ) };
    and for some reason, it is returning a list of every element. @issues in this case is somewhat like this:
    $VAR1 = [ '00001', '00002', '00003', '00004', '00005', '00006', '00007', '00008', '00009', #... ]
    for several thousand issues.

    --
    Laziness, Impatience, Hubris, and Generosity.

      Ah, I see.

      Well, you need to make things equal before comparing them ... you are putting numbers 1..n as keys into %isthere, but try to get out $isthere{"00001"}. That will not work - the keys in %isthere are "1" etc.

      This is what you need to make it work:

      $isthere{$_} = "yes" for map {$_+0} @list;

      Christian Lemburg
      Brainbench MVP for Perl
      http://www.brainbench.com

        Or else you might change what you put into the hash:

        my %isthere = map { sprintf ('%05d',$_) => 0 } ($low..$high);
        Assuming that all numbers you want to check are in the same format...

        pike

        Wonderful! This works. Can you please go into a little more detail as to how this works? I am not sure how:
        "00010" + 0
        DWIM's.

        brother dep

        --
        Laziness, Impatience, Hubris, and Generosity.

      Here's a slightly more in-depth analysis of the problem in the subroutine (for those that may be interested):

      We already know Perl can use a string that looks like a number as a number. Each SV (perl's internal representation of scalar value) has various slots to hold different kinds of data (and various flags to indicate which slots are currently valid) --- we will just simplify this to two slots, one for Strings and one for Numbers, for this discussion ...

      Are all the numbers zero padded to the same length?? Then see my answer below but use the default sort instead of the numeric sort in both places. The first numeric sort is 'numerifying' the strings and stripping the leading zeroes which messes up the rest of the routine.

      Update: Ok, its not necessarily the first sort stripping the zeroes (in my test case, I was doing a numeric sort before I called the sub, so that's what I was seeing), but it does cause the subsequent statements to treat the list as numbers instead of character strings. So you should treat them as either numbers or characters throughout the sub (like using the +0 trick to begin with), but not both.

        The first numeric sort is 'numerifying' the strings and stripping the leading zeroes which messes up the rest of the routine.

        I am not if that is right, at least not in the simple sense of what you say. See:

        sub find_holes { my @list = @{ shift() }; @list = sort { $a <=> $b } @list; my $low = $list[0]; my $high = $list[-1]; my %isthere = map { $_ => 0 } ($low..$high); print "@{[sort keys %isthere]}\n\n"; print "@list\n\n"; $isthere{$_} = "yes" for map {$_+0} @list; my @vacancies = grep { not $isthere{$_} } sort keys %isthere; return \@vacancies; } my @issues = @{ [ '00001', '00002', '00003', '00004', # '00005', '00006', '00007', '00008', '00009', #... ] }; print join("\n", @{ find_holes( \@issues ) });

        Prints:

        d:\tmp\try>perl try.pl perl try.pl 1 2 3 4 5 6 7 8 9 00001 00002 00003 00004 00006 00007 00008 00009 5

        What the sort probably does is to fill the number entry in the glob of the entries of @list. But why does the ".." operator use the number slot, while the hash access code and print use the string slot?

        Christian Lemburg
        Brainbench MVP for Perl
        http://www.brainbench.com

        My gut feeling is that depending on having the input file correctly padded is a bad call, especially if there are humans involved in the process. Better to pad the output as is appropriate.

        OTOH if the data is _guaranteed_ to be padded correctly then your point is good and I would utilize it. Anything that minimizes the amount of code required is a Good Thing in my book.

        Just my $0.02 :-)

        Yves / DeMerphq
        --
        Have you registered your Name Space?

Re: Finding missing elements in a sequence (code)
by gt8073a (Hermit) on Nov 06, 2001 at 21:29 UTC
    print join "\n", @{ find_holes( \@issues ) };

    how are you populating @issues? if you are reading them in from a file, are you removing new lines?

    Will perl for money
    JJ Knitis
    (901) 756-7693
    gt8073a@industrialmusic.com

Use array instead of hash if all 'keys' are numeric
by pike (Monk) on Nov 07, 2001 at 14:30 UTC
    I have thought again about this problem, and it seems to me that you could have avoided this problem by using an array instead of a hash in your sub - this will get you the correct (numeric) comparison and save you one sort:

    sub find_holes { my @list = @{ shift() }; @list = sort { $a <=> $b } @list; my $low = $list[0]; my $high = $list[-1]; my @notfound; @notfound[$low..$high] = ($low..$high); $notfound[$_] = 0 for @list; my @vacancies = grep {$_} @notfound; return \@vacancies; }
    Since the value of the array index is not accessible in a simple for loop, I set the values of the array equal to the index and inverted the sense of the final grep, but otherwise everything is as it was. Note that this code will fail (just like the original one) if 0 is contained in the interval $low .. $high. But this can easily be fixed by setting the values we found to undef rather than 0 and checking for defined values in the final grep...

    pike

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://123615]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2024-03-19 06:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found