http://www.perlmonks.org?node_id=58795

the_slycer has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I have some code that is comparing the contents of an array vs a hash, the basic idea is that if the array contains some new data to add it to the hash, AND if the hash contains something that is not in the array, to remove it.

The below "works" but to me appears terribly inneficient and kind of ugly, looking for a better way of doing this:
my @newfiles = ("new_one.txt", "old_one.txt", "old_two.txt", "new_two.txt"); my %oldfiles; $oldfiles{"old_one.txt"}="this is the old one"; $oldfiles{"old_two.txt"}="this is the old two"; $oldfiles{"old_three.txt"}="this one should be deleted"; $oldfiles{"new_two.txt"}="this one has been added already"; my %newfiles; foreach (@newfiles){ $newfiles{$_}++; next if exists $oldfiles{$_}; print "have to add $_\n"; } foreach (keys %oldfiles){ next if exists $newfiles{$_}; print "have to remove $_\n"; }
Thanks..

Replies are listed 'Best First'.
Re: Better way?
by MeowChow (Vicar) on Feb 16, 2001 at 10:46 UTC
    If you are looking to do specifically what you state in your first sentence (assuming your print statements are for debugging purposes), you can do something like:
    my @new = qw(4 5 6 7); my %old = qw(1 a 2 b 3 c 4 d 5 f); my %new; $new{$_} = $old{$_} for @new;
       MeowChow                                   
                   s aamecha.s a..a\u$&owag.print

      MeowChow, here's another version using a hash slice instead of a for loop to assign the contents of one hash to another:

      my %old = ( 1 => a, 2 => b, 3 => c, 4 => d, 5 => f, ); my @keys = qw(4 5 6 7); #Keys to copy from the old to the new hash my %new; @new{ @keys } = @old{ @keys };

      In this case, the difference between using a for loop and a hash slice means a slight edge in speed during benchmarking. However, the larger the data set the wider the gap becomes.

Re: Better way?
by Yohimbe (Pilgrim) on Feb 16, 2001 at 09:59 UTC
    This looks like a perl based rsync(1) kind of thing.
    Unless I'm horribly wrong here, this is really not that ugly. About the only thing ugly is the memory usage for the various lists. That could get big, if the file lists are long. Yours is pretty easily understood code, and unless it runs many many times a second, should not be a huge problem. But if you wanted to trim up the memory usage, perhaps looking at processing the lists sequentially, and storing only exceptions would be better/faster/more elegant.
    # assuming %existing_files is a hash of existing # filenames and descriptions # and we recieve the new list of files on stdin while (<STDIN>) { chomp; if ($existing_files{$_}) { delete $existing_files{$_}; } else { $existing_files{$}="Newly Added file $_"; } } # at this point %existing_files contains the files that # should be deleted, and the newly added ones, # with the description used to differentiate them foreach (sort keys %existing_files} { if ($existing{$_} =~ /^Newly/ ) { print "Added $_;"; } else { print "Delete $_"; } }
    }
    --
    Jay "Yohimbe" Thorne, alpha geek for UserFriendly