http://www.perlmonks.org?node_id=339883


in reply to Re: Re: Re: List::Compare
in thread List::Compare

It's surprisingly mellow thrash-wise, actually. I just do
#SETUP DELETED, CODE WON'T RUN my @file1 = <FILE1>; my @file2 = <FILE2>; my $lc = List::Compare->new(\@file1, \@file2); my @file1only = $lc->get_Lonly; my @file2only = $lc->get_Ronly; print OUT "Files that exist only in FILE1:\n"; print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME!!!\n\n"; foreach my $file1(@file1only) { unless (($file1 =~ "Tmp") or ($file1=~"Temp")) { print OUT $file1; } } print OUT "\n\n"; print OUT "Files that exist only in FILE2:\n"; print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME!!!\n\n"; foreach my $file2(@file2only) { unless (($file2 =~ "Tmp") or ($file2=~"Temp")) { print OUT $file2; } }
and it takes less than a minute to run on my box here at work.

Unfortunately, we're an all-Windows shop. I've managed to infiltrate a couple of FreeBSD boxes into the test department (cool network tools and a neato disk imaging system called Frisbee), but it's hardly worth moving all of the files over there and back just to save a few seconds using "sort" instead of List::Compare.

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: List::Compare
by tachyon (Chancellor) on Mar 25, 2004 at 23:36 UTC

    Am I missing something or is List::Compare in your context giving you nothing that this code does not? This is probably faster and will use less memory as well.

    my @ary1 = ( 1..10 ); my @ary2 = ( 5..15 ); my %h; $h{$_}++ for ( @ary1, @ary2 ); unique( "Only in ary1", \@ary1, \%h ); unique( "Only in ary2", \@ary2, \%h ); common( "Common to ary1 and ary2", \%h ); sub unique { my ( $text, $ary, $hash ) = @_; print $text, $/; do{ print "$_\n" unless $hash->{$_} == 2} for @$ary; } sub common { my ( $text, $hash ) = @_; print $text, $/; do{ print "$_\n" if $hash->{$_} == 2} for keys %$hash }

    cheers

    tachyon

      I agree with your point -- this might be one of those modules that just "objectifies" something that's already pretty simple (so why bother?). Your alternative is a common, simple approach that is bound to be appropriate for the reviewer's main example -- comparing the output of "find" from two different machines -- because there won't be any duplications within a single set. But of course if one element happens to appear more than once in a single list, you get a false report about it being in both lists. This is easy to fix (in fact, fixing it makes the code simpler):
      my @ary1 = qw/1 2 3 4 5 4 6/; my @ary2 = qw/5 6 7 8 7 9 10/; my %h; $h{$_} .= "a" for ( @ary1 ); $h{$_} .= "b" for ( @ary2 ); compare( "Only in ary1", "a\$", \%h ); compare( "Only in ary2", "^b", \%h ); compare( "Common to both", "ab", \%h ); sub compare { my ( $text, $regex, $hash ) = @_; print join "\n", $text, grep { $$hash{$_} =~ /$regex/ } keys %$has +h; print "\n"; }
        graff and tachyon:

        Thanks for the demonstrations. As I mentioned in the review, I'm not entirely comfortable manipulating hashes yet-- and as graff points out succinctly, a small slip can give odd results that are hard to find. (This, BTW, is why I'm spending time at perlmonks-- I'm at the point where I know enough Perl to *really* screw stuff up if I don't understand what I'm doing-- I need to read other people's code as much as I can.)

        And that's another reason why I thought the review was appropriate-- the module author points to his inspiration (much like your code) in the Perl Cookbook, and also to other modules that do similar things.

        This is a great way to learn, while also not making silly mistakes in production code. Does it objectify something simple? Sure-- but it gives me a lot more confidence that my script is working correctly, and also points me to a starting place for when I have to do something more complex later.
        I've always been upfront about the fact that the very first thing List::Compare did was to put an object-oriented wrapper around well-known, Cookbook-style code for list comparisons. The reason I bothered was the Perl virtue of Laziness: I was comparing lists so often, I was tired of re-typing the code. Once I perfected it for my own use, I poked around on CPAN and discovered (to my surprise) that nobody had beaten me to it.

        That being said, I've expanded List::Compare's functionality over the last two years and, in particular, have provided considerable flexibility in its interface. One of its interface's is functional, not object-oriented. So List::Compare is well past the point of merely 'objectifying' something.

        Jim Keenan (author of List::Compare)

        <generalizing>
        Do you need "a\$" and "^b"? That seems rather specific, and doesn't generalize. Perhaps "^a+\$" and "^b+\$" would be a better choice? Then it doesn't matter which one is processed first, or how many there are (e.g., "^c+\$").
        </generalizing>

        BTW, I like the $h{$_} .= "a" idea. To scale it much larger, I'd probably go with a bit vector, something like this:

        my @AoA = ( [1..10], [2..11], [3..12] ); my %h; foreach my $i ( 0..#@AoA ) { vec($h{$_},$i,1) = 1 for @{$AoA[$i]}; }
        Of course, it's a bit more work to get the "only"s and "common"s out %(

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

Re: Re: Re: Re: Re: List::Compare
by dragonchild (Archbishop) on Mar 26, 2004 at 13:23 UTC
    Check out both Cygwin and Microsoft's own Unix tools. I am currently developing the same Perl modules on Solaris9, Redhat9, and Cygwin (both on Win2k Pro and WinXP Pro).

    Unix tools are available in most places you'd care to look, if you know where to look. :-)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose