Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I've got a bit of a tricky situation here. I'm trying to write a script that will take a list of software packages and compare it to a second list of packages. The second list is a list of available updates. If there is a software package (for isntance, libpng-devel) that exists in both lists, but has a newer version number in the second list, I want to write the newer package name/number to a log file somewhere. So, basically, I'm writing something that checks for software updates and leaves a list of available updates in a logfile for you to find. Here's an example of the two lists:

LIST 1 (currently installed software)
zip-2.3.1-14
xpdf-1.01-8
ggv-1.99.8-2
libpng-devel-1.2.2-8
xvattr-1.3-ogle1

LIST 2 (updates available)
WindowMaker-0.80.1-5
lynx-2.8.5-7.1
xpdf-1.01-10
ggv-1.99.9-5
xvattr-1.4-ogle1
libpng-devel-1.2.2-9

The resulting log file should therefore contain:
xpdf-1.01-10
ggv-1.99.9-5
xvattr-1.4-ogle1
libpng-1.2.2-9

This is tricky since you first have to seperate the version number from the software name, determine if the same software package is listed in both lists, then compare the version numbers of the two--which may contain several dot or dash seperated numbers and/or strings. Is there a simple way to do this (e.g. maybe a version-comparing perl module) or is this going to be one *messy* regex?

Replies are listed 'Best First'.
Re: Comapring Version Numbers
by xmath (Hermit) on Mar 04, 2003 at 16:10 UTC
Re: Comapring Version Numbers
by BrowserUk (Pope) on Mar 04, 2003 at 17:08 UTC

    This might work. The idea is to expand any groups of numerals in the name with some arbitrarially large number of leading zeros--I used 5 which maybe overkill. That way, the filenames will be directly comparable using standard string comparision operators.

    I also built the keys to the two hashes by removing any digits, from the names to simplify the lookup of one has against the other. I stored the expanded filename, along with the original in an anonymous array keyed by the reduced filename. To clarify for xpdf-1.01-10, the structure contains:

    {'xpdf-.-' => ['xpdf-00001.00001-00010', 'xpdf-1.01-10'];}

    That should allow quick, accurate lookup of package names by key, gives a value for each that can be compared using gt (or cmp etc) and retains the original value for display or logging.

    I'm sure that there are examples of packages that will break this, but they are probably the exception rather than the rule.

    #! perl -slw use strict; my %current = map{ (my $key = $_) =~ s[[\d.]+][]g; (my $comp = $_) =~ s[(\d+)][ sprintf'%05d', $1]ge; $key => [ $comp, $_ ]; } qw[ zip-2.3.1-14 xpdf-1.01-8 ggv-1.99.8-2 libpng-devel-1.2.2-8 xvattr-1.3-ogle1 ]; my %updates = map{ (my $key = $_) =~ s[[\d.]+][]g; (my $comp = $_) =~ s[(\d+)][ sprintf'%05d', $1]ge; $key => [ $comp, $_ ]; } qw[ WindowMaker-0.80.1-5 lynx-2.8.5-7.1 xpdf-1.01-10 ggv-1.99.9-5 xvattr-1.4-ogle1 libpng-devel-1.2.2-9 ]; for my $pkg (keys %current) { if (exists $updates{$pkg} and $updates{$pkg}[0] gt $current{$pkg}[0] ){ print 'Update ', $updates{$pkg}[1], $/ , ' available for: ', $current{$pkg}[1]; } } __END__ C:\test>240384 Update xvattr-1.4-ogle1 available for: xvattr-1.3-ogle1 Update xpdf-1.01-10 available for: xpdf-1.01-8 Update ggv-1.99.9-5 available for: ggv-1.99.8-2 Update libpng-devel-1.2.2-9 available for: libpng-devel-1.2.2-8

    Examine what is said, not who speaks.
    1) When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.
    2) The only way of discovering the limits of the possible is to venture a little way past them into the impossible
    3) Any sufficiently advanced technology is indistinguishable from magic.
    Arthur C. Clarke.
Re: Comapring Version Numbers
by jasonk (Parson) on Mar 04, 2003 at 16:30 UTC

    This is going to depend mostly on what is doing the determining of which ones are newer. Assuming these are RPM files, as they appear to be, the rpm documentation includes a description of how RPM determines if one package is "newer" than another.

    From the RPM documentation:

    The algorithm that RPM uses to determine the version ordering of packages is simple and developers are encouraged not to rely on the details of its working. Developers should keep their numbering scheme simple so any reasonable ordering algorithm would work. The version comparison algorithm is in the routine rpmvercmp() and it is just a segmented strcmp(3). First, the boundaries of the segments are found using isdigit(3)/isalpha(3). Each segment is then compared in order with the right most segment being the least significant. The alphabetical portions are compared using a lexical graphical ascii ordering, the digit segments strip leading zeroes's and compare the strlen before doing a strcmp. If both numerical strings are equal, the longer string is larger. Notice that the algorithm has no knowledge of decimal fractions, and perl-5.6 is "older" than perl-5.00503 because the number 6 is less than the number 503.

    If you are on a Red Hat Linux machine, there is very good documentation with examples in /usr/share/doc/rpm-<version>/dependencies. There are also a couple of RPM modules in CPAN.

Re: Comapring Version Numbers
by xmath (Hermit) on Mar 04, 2003 at 16:06 UTC
    It seems all of these names use the same style as RPM files, in which case you can simply extract the three fields (package, version, release) with:
    /^(.+)-([^-]+)-([^-]+)$/
    or if you know they're in the right format even just /(.*)-(.*)-(.*)/

    Comparison however is a different story. Maybe there's a module for it, but with something like "ogle1", I'm not even sure there's a canonical way of comparing those

Re: Comapring Version Numbers
by robartes (Priest) on Mar 04, 2003 at 16:21 UTC
    I'd approach this by splitting the package names into a hash, and then comparing those:
    #!/usr/local/bin/perl -w use strict; my %installed; my %updates; # __DATA__ juggling below is just a kludge to simulate list files while (<DATA>) { my $target=$.>5?\%updates:\%installed; # Regex works on your limited sample, but is quite weak. /(.*?)-(\d.*)/; $target->{$1}=$2; } foreach my $package (keys %updates) { delete $updates{$package} unless (exists $installed{$package} && ($i +nstalled{$package} ne $updates{$package})); } print "Packages to update:"; print join "-", ($_,$updates{$_}), print "\n" for keys %updates; __DATA__ zip-2.3.1-14 xpdf-1.01-8 ggv-1.99.8-2 libpng-devel-1.2.2-8 xvattr-1.3-ogle1 WindowMaker-0.80.1-5 lynx-2.8.5-7.1 xpdf-1.01-10 ggv-1.99.9-5 xvattr-1.4-ogle1 libpng-devel-1.2.2-9 __END__ Packages to update: ggv-1.99.9-5-1 xvattr-1.4-ogle1-1 xpdf-1.01-10-1 libpng-devel-1.2.2-9-1
    However, the crux of your problem, comparing version numbers, is not dealt with by this - it just spits out the package if it has a different version, not necessarily a higher one. And, as stated in the code, the regex I use is probably easily defeated by some of the more exotic version names (it just splits on the first dash which is followed by a number).

    However, this should give you some idea on how to procede.

    CU
    Robartes-

Re: Comapring Version Numbers
by hardburn (Abbot) on Mar 04, 2003 at 16:05 UTC

    I don't think there is a general soultion that will work in all cases. There are just too many ways to use version numbers and put them in a filename.

    You might be able to get the correct number much of the time with something like this:

    # $filename declared elsewhere $filename =~ /\A [^-]+ # Read until the first '-' ([\d\.]+) # Grab a series digits and '.' chars /x; my $version = $1;

    But that will fail on many version numbering systems.

    ----
    Reinvent a rounder wheel.

    Note: All code is untested, unless otherwise stated

Re: Comapring Version Numbers
by Anonymous Monk on Mar 04, 2003 at 22:22 UTC
    Thanks for everyone who has contributed thus far. These are, in fact, RPMS that I'm dealing with--sorry for neglecting to mention that earlier. I've taken a look at RPM::Update (part of RPM-Tools-0.8). Running this code should have the effect of finding and downloading updates:
    RPM::Update::execute('-v','-ftp', 'my.ftpserver.edu/mirror/ftp.redhat. +com/pub/redhat/linux/updates/8.0/en/os','-d', '/tmp/rpm-download','ch +eck','-dl');
    Unfortunately, I keep getting told "No new updates are available on my.ftpserver.edu". I removed a couple of packages from my system and installed the 8.0 release versions of them (specifically, xpdf and cvs) before running the script, so it should have been able to detect the updates. I'm not sure if I'm doing something wrong here or if the module is simply failing to recognize that updates are, in fact, available. And for the record, I did double check my URL, the existence and permissions of /tmp/rpm-download, and that updated xpdf and cvs packages were available. I even used ethereal to watch the packet stream...I succesfully logged in to the server and got directory listings, so it's not something in the ftp transaction.

    Any more thoughts? Can someone point out something I'm doing wrong here?
Re: Comapring Version Numbers
by jonadab (Parson) on Mar 05, 2003 at 04:06 UTC

    I'm not sure this is possible even for a human to get right on the first try every single time. The package name can end in a number, which may or may not be preceded by what might seem to be a delimiting character (e.g., a hyphen in some cases). How to tell whether that's the end of the package name or the beginning of the version number? Sometimes knowledge of the development cycle of the package in question is required.

    Of course, you can ignore those cases and assume that (as is the case for the majority of packages) any numbers separated from the package name by a delimiting character start the version number. That is the approach I would suggest. When it fails, you will just get extraneous entries in your logfile.

    Still, comparing version numbers is hard. There is no standard for how they work. I mean, the basics are pretty straightforward (a.b.c.e comes after a.b.c.d if e>d), but a lot of packages do additional esoteric things with their version numbers. Alphas, betas, release candidates, prereleases, ... it's messy. One is tempted to say that alphabetic characters sort after numeric ones which sort after truncatedness, but then what do you do with a version like 3.23.41-5mdk? Ewww, that brings up the issue of forked versions... and what do you do when the same version of something is distributed with different patch options, and the patch options are postpended to the version number? In that case you want the package with the same patch options but the highest version number before them... [suppresses urge to go wash hands]

    Still worse, most open-source packages consider the numbers between delimiters as integers, so that (e.g.) 7.11 will sort after 7.10 after 7.9 -- but there are some packages that do version numbering the other way, where 7.11 comes right after 7.1 but before 7.2 (i.e., ASCIIbetical sorting). This is especially common with packages whose lead developers started on platforms other than Unix. (Almost all version numbers in the DOS world were this way, for example, so programmers who used to work on DOS often still number that way.)


    for(unpack("C*",'GGGG?GGGG?O__\?WccW?{GCw?Wcc{?Wcc~?Wcc{?~cc' .'W?')){$j=$_-63;++$a;for$p(0..7){$h[$p][$a]=$j%2;$j/=2}}for$ p(0..7){for$a(1..45){$_=($h[$p-1][$a])?'#':' ';print}print$/}
Re: Comapring Version Numbers
by Anonymous Monk on Mar 05, 2003 at 03:41 UTC
    If you are referring to something like rpm packages, then using some of rpm's internal functionality might help. You could use 'rpm -qi $package' and extract the version listed in the rpm info itself. Barring that, I would suggest using a bubble-sort method to get the "new" entries as close to the "old" entries as possible and then comparing the "new" neighbors to the "old" ones. Not very efficient, but a thought.
Re: Comparing Version Numbers
by paulbort (Hermit) on Mar 05, 2003 at 16:11 UTC
    First, you need to extract the version number, so that you can compare it. I wrote a utility to comb a directory for duplicates of the same package, keeping only the newest. I used this to get the version number:
    # Set $file to the rpm you're querying, then: my @result = `rpm -qp --queryformat "%{NAME}/%{ARCH}/%{VERSION}.%{RELE +ASE}\n" $file`; my ($name, $arch, $version) = split /\//, $result[0];
    From there it's straight into a hash with "$name:$arch" as the key, and the version number as the value.

    To compare version numbers, split on /\./ and check each array entry separately from left to right.

    There are probably nicer ways to do this with a CPAN module, but in this case I was too close to a solution with executing rpm to back up and do the right thing.
    --
    Spring: Forces, Coiled Again!
Re: Comparing Version Numbers
by rje (Deacon) on Mar 05, 2003 at 18:44 UTC
    Well, if you can guarantee that LIST 2's packages will always be later than LIST 1's, you might be able to do this:
    # # build some lists for testing purposes... # my @current = qw ( zip-2.3.1-14 xpdf-1.01-8 ggv-1.99.8-2 libpng-devel-1.2.2-8 xvattr-1.3-ogle1 ); my @updates = qw ( WindowMaker-0.80.1-5 lynx-2.8.5-7.1 xpdf-1.01-10 ggv-1.99.9-5 xvattr-1.4-ogle1 libpng-devel-1.2.2-9 ); # # Ok, here's the real code snippet. # my %one; my %two; foreach ( @current, @updates ) { if (/^(\w+)-/) { $two{$_}++ if $one{$1}; $one{$1}++; } } print join( "\n", sort keys %two );

    Output:
    C:\Perl\bin>perl upgrade.pl
    ggv-1.99.9-5
    libpng-devel-1.2.2-9
    xpdf-1.01-10
    xvattr-1.4-ogle1
    

    Rob