Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Re: Re: List::Compare

by dragonchild (Archbishop)
on Mar 25, 2004 at 21:01 UTC ( #339877=note: print w/replies, xml ) Need Help??


in reply to Re: Re: List::Compare
in thread List::Compare

What it sounds like you're doing is slurping all the files into memory, then dealing with them using List::Compare. I don't know what beast of a machine you're using, but I doubt most machines can do that without serious thrashing.

Much better would be to use the unix sort command. This is one of the exact reasons it was designed. It is written in very optimized C, and as such, will always beat out Perl.

------
We are the carpenters and bricklayers of the Information Age.

Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

Replies are listed 'Best First'.
Re: Re: Re: Re: List::Compare
by McMahon (Chaplain) on Mar 25, 2004 at 21:29 UTC
    It's surprisingly mellow thrash-wise, actually. I just do
    #SETUP DELETED, CODE WON'T RUN my @file1 = <FILE1>; my @file2 = <FILE2>; my $lc = List::Compare->new(\@file1, \@file2); my @file1only = $lc->get_Lonly; my @file2only = $lc->get_Ronly; print OUT "Files that exist only in FILE1:\n"; print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME!!!\n\n"; foreach my $file1(@file1only) { unless (($file1 =~ "Tmp") or ($file1=~"Temp")) { print OUT $file1; } } print OUT "\n\n"; print OUT "Files that exist only in FILE2:\n"; print OUT "IGNORING FILES WITH Tmp OR Temp IN PATHNAME!!!\n\n"; foreach my $file2(@file2only) { unless (($file2 =~ "Tmp") or ($file2=~"Temp")) { print OUT $file2; } }
    and it takes less than a minute to run on my box here at work.

    Unfortunately, we're an all-Windows shop. I've managed to infiltrate a couple of FreeBSD boxes into the test department (cool network tools and a neato disk imaging system called Frisbee), but it's hardly worth moving all of the files over there and back just to save a few seconds using "sort" instead of List::Compare.

      Am I missing something or is List::Compare in your context giving you nothing that this code does not? This is probably faster and will use less memory as well.

      my @ary1 = ( 1..10 ); my @ary2 = ( 5..15 ); my %h; $h{$_}++ for ( @ary1, @ary2 ); unique( "Only in ary1", \@ary1, \%h ); unique( "Only in ary2", \@ary2, \%h ); common( "Common to ary1 and ary2", \%h ); sub unique { my ( $text, $ary, $hash ) = @_; print $text, $/; do{ print "$_\n" unless $hash->{$_} == 2} for @$ary; } sub common { my ( $text, $hash ) = @_; print $text, $/; do{ print "$_\n" if $hash->{$_} == 2} for keys %$hash }

      cheers

      tachyon

        I agree with your point -- this might be one of those modules that just "objectifies" something that's already pretty simple (so why bother?). Your alternative is a common, simple approach that is bound to be appropriate for the reviewer's main example -- comparing the output of "find" from two different machines -- because there won't be any duplications within a single set. But of course if one element happens to appear more than once in a single list, you get a false report about it being in both lists. This is easy to fix (in fact, fixing it makes the code simpler):
        my @ary1 = qw/1 2 3 4 5 4 6/; my @ary2 = qw/5 6 7 8 7 9 10/; my %h; $h{$_} .= "a" for ( @ary1 ); $h{$_} .= "b" for ( @ary2 ); compare( "Only in ary1", "a\$", \%h ); compare( "Only in ary2", "^b", \%h ); compare( "Common to both", "ab", \%h ); sub compare { my ( $text, $regex, $hash ) = @_; print join "\n", $text, grep { $$hash{$_} =~ /$regex/ } keys %$has +h; print "\n"; }
      Check out both Cygwin and Microsoft's own Unix tools. I am currently developing the same Perl modules on Solaris9, Redhat9, and Cygwin (both on Win2k Pro and WinXP Pro).

      Unix tools are available in most places you'd care to look, if you know where to look. :-)

      ------
      We are the carpenters and bricklayers of the Information Age.

      Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://339877]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2020-05-29 10:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If programming languages were movie genres, Perl would be:















    Results (169 votes). Check out past polls.

    Notices?