Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

List::Compare

by McMahon (Chaplain)
on Mar 25, 2004 at 17:34 UTC ( #339812=modulereview: print w/ replies, xml ) Need Help??

Item Description: Easily creates sophisticated comparison information about arrays.

Review Synopsis:

I had to create a report of the differences between two files each containing thousands of unsorted records. I *thought* that I needed some form of Diff. I tried Algorithm::Diff, but discovered that it only works line-by-line. For instance, Algorithm::Diff reports that (1,2,3) and (2,1,3) are different lists.

Furthermore, some commercial tools I tried did the same thing.

I accidentally stumbled across List::Compare, which was lucky for two reasons.

Most importantly, List::Compare solved my problem, and more: it shows intersections and unions of sets; it shows elements unique to either list (that was my particular problem); it shows all unique elements of both lists, and even all elements of both lists. The interface is elegant and intuitive. I'll be dealing with large inventory lists well into the future, and List::Compare is shaping up to be my tool of first resort for all my list comparison needs.

I was also lucky because James Keenan gives a detailed history of the source (Perl Cookbook) and the circumstances (introductory Perl course) that inspired him to write the module, as well as pointing out a number of similar modules, just in case List::Compare doesn't solve your particular problem. I am relatively new to Perl, and I don't entirely grok the power of hashes yet. List::Compare not only allowed me to solve my immediate problem quickly and elegantly, but it also showed me how to understand the code that underlies the List::Compare module itself.

Elegant, intuitive, well-documented, and with great hints about the magic behind the module. I'm glad I found List::Compare, and you probably will be, too.

Comment on List::Compare
Re: List::Compare
by dragonchild (Archbishop) on Mar 25, 2004 at 17:49 UTC
    I'd like to point out a few short-comings with List::Compare:
    1. It stringifies almost everything. Specifically, it does not stringify get_bag(), but it does everything else. This means it will have serious problems working with references.
    2. It doesn't maintain order. This may not be important in many situations, but a "list" is inherently ordered. Set::Object has similar methods, and actually deals with the appropriate term. :-)
    3. It has a bug when dealing with lists of code-references without using the -u (unsorted) flag. Specifically, if the first element in your first list is a code-reference, sort will attempt to use it as the sorting method. Which means it's not in the list of things to sort, so it's lost from the bag.

    I don't meant to indicate that it's a bad module. Perl lists are ... difficult to deal with. (Oh - it also doesn't deal with lists ... it deals with arrays. But, that's another nit.)

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      Thanks for the warnings!

      But the problem this solves for me is to compare gigantic files full of the output of File::Find. All strings, no references, order immaterial. Under those circumstances, it rocks bells. =)
        What it sounds like you're doing is slurping all the files into memory, then dealing with them using List::Compare. I don't know what beast of a machine you're using, but I doubt most machines can do that without serious thrashing.

        Much better would be to use the unix sort command. This is one of the exact reasons it was designed. It is written in very optimized C, and as such, will always beat out Perl.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

      Until I recently saw a reference to this discussion thread in a CPAN review of List::Compare, I was unaware that my module was being discussed in the Monastery. So today I'd like to share some thoughts on the issues raised. As these issues were raised by various monks, I'll reply to the individual threads today but try to integrate these comments in the talk I am giving on List::Compare at YAPC::NA in Buffalo two weeks from today.

      Re: Dragonchild on List::Compare not working with references:

      The real-world production problems for which I originally developed List::Compare did not include lists of Perl references. The module was not designed to handle them and has never been tested against them.

      If you can suggest a way for the module to detect when a list passed to the constructor (or a function in List::Compare::Functional) contains a reference, I will be happy to generate a 'die' at that point. Otherwise, I'll simply include a warning against this in the next revision of the documentation.

      Re: Dragonchild on List::Compare not preserving order

      I don't see any particular reason why a report on the the intersection, union, etc., of two or more lists should come back in any particular order. Whatever order the original lists had is untouched; the order of the set comparisons is irrelevant. As the documentation says at a couple of points, List::Compare was designed to answer the question: Was this item seen in that list? Pure and simple. In what position in the list the item was seen is a question for a different module to answer. Jim Keenan

        The real-world production problems for which I originally developed List::Compare did not include lists of Perl references. The module was not designed to handle them and has never been tested against them.

        That's perfectly fine, but you don't mention that in your documentation. Either you handle them or you make it perfectly clear that you don't handle them. This is a very important point.

        As for figuring out if a list contains a reference ... what's wrong with grep { ref } @list?

        I don't see any particular reason why a report on the the intersection, union, etc., of two or more lists should come back in any particular order.

        If you were dealing with sets, then you would be correct. However, if I'm working with lists, I expect that the ordering property of lists would be maintained in every single action. map and grep maintain order. Your functions, to me, are in the same vein.

        If I want to find out "Is this item in that grouping?", I would consider that a set operation, if I'm using a module. Sets are intrinsically unordered.

        The question also isn't a matter of what position in the list a given item was seen. If you are saying @list3 = intersection(\@list1, \@list2);, I would assume (because it's not otherwise stated in the docs) that @list3 has the elements in the order seen in @list1. Essentially, intersection() would be written as so:

        sub intersection { my ($l1, $l2) = @_; my %l2; undef @l2{@$l2}; my @l3 = grep { exists $l2{$item} } @l1; return wantarray ? @l3 : \@l3; }

        That preserves the order. If I don't care about order, I should be using sets.

        Now, you're wondering what the big deal is - most people wouldn't care. And, you'd be right. Except, some people will and it doesn't cost a lot to make it right.

        ------
        We are the carpenters and bricklayers of the Information Age.

        Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

        I shouldn't have to say this, but any code, unless otherwise stated, is untested

Re: List::Compare
by davorg (Chancellor) on Mar 25, 2004 at 19:49 UTC
    My only complaint about List::Compare is the name. It doesn't compare lists, it compares arrays. Enought people confuse arrays and lists without this module adding to the problem. Maybe I should add the same functionality to Array::Compare.
    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      Yah, that's why I said "accidentally stumbled across..." I was searching on terms like "diff", "file", etc. Even searching on "list compare" shows List::Compare 7 items down the page.

      Which is why I thought a review might help-- the module's a little misnamed, and it's hard to find, but it works really well, and has pointers to lots of other modules that do similar things and that are probably also hard to find.
      This point was raised two years ago when I first distributed the module via CPAN. My feeling then was that the fact that the lists being compared were real-world data sets (I can hear my boss saying, "Get me a list of ...") was more relevant to its naming than the fact that the lists were placed into arrays before being passed to the constructor.

      I think the point is less valid today, because now the lists do not necessarily be in the form of arrays before being passed to the constructor (or passed to function in List::Compare::Functional). Now, you can pass seen-hashes to the constructor or function -- seen-hashes which imply the existence of underlying lists. See the documentation on seen-hashes

      Dave, I looked at Array::Compare when preparing the documentation. It seems to me that your approach is more tightly focused on comparing Perl arrays and takes a different approach to determining 'sameness' than does List::Compare. The two modules are tackling different problems.

        My feeling then was that the fact that the lists being compared were real-world data sets (I can hear my boss saying, "Get me a list of ...") was more relevant to its naming than the fact that the lists were placed into arrays before being passed to the constructor.

        Guess we'll just have to agree to disagree on that then. I spend a lot of time helping out on Perl beginners' forums and one of the most common errors I see is the confusion between lists and arrays. Having that confusion backed up in the name of a CPAN module doesn't really help matters.

        I'm seriously considering deprecating Array::Compare in favour of Data::Compare. If I do, would you like to take over Array::Compare so you can use the name for your module?

        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

Back to Reviews

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: modulereview [id://339812]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (8)
As of 2014-11-22 00:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (118 votes), past polls