Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

File Parsing

by kel (Sexton)
on Jun 21, 2014 at 20:45 UTC ( [id://1090794]=perlquestion: print w/replies, xml ) Need Help??

kel has asked for the wisdom of the Perl Monks concerning the following question:

My humble apologies of this is a question beneath contempt in simplicity.

I am simply trying not to reinvent the wheel or oxcart here, and only come hither after weary hors on google in a fruitless pursuit.

What I am trying to do is sort out my copies of thousands of debian modules, keep the LATEST ones and move the older dups to another directory.

bar_123.10.0_deb foo_123.10.0_deb foo_123.2.8_deb

I can split by underscore, grab $2, split by dots, specially process or ignore $3 (and/or) $2 is it has anything non-mumeric - I'll deal with those another day.

I am looking for a 'uniq' type commad that will only spot foo (as having one or more duplicates on the first split $1 and then permit a real comparison of version numbers.

A simple sort may well fail here, for obvious reasons.

But... this is something that SHOULD HAVE been done many times before. I am certainly not requesting a script, but just a pointer to something I can learn from on this issue!

Perhaps it is already a module in the VCI::VCS libraries.

The problem for me is not as much in finding the answers, but learning to ask the right questions.....

Replies are listed 'Best First'.
Re: File Parsing
by Anonymous Monk on Jun 22, 2014 at 01:36 UTC

    The task of sorting just any version numbers is not well defined, because everybody does this differently. Examples: Perl v5.20 is the same as Perl 5.020. If you sort the two versions numbers "2.5" and "2.5_rc1" with Perl's sort, then "2.5" will sort before "2.5_rc1". More generally: Is 5.02 the same as 5.020, or even the same as 5.2? Does 0.5 come before or after 0.1, before or after 0.10, and before or after 0.5.x? The list goes on.

    The problem for me is not as much in finding the answers, but learning to ask the right questions.....

    Then may I suggest the question to be "How do I sort Debian version numbers?"

    This question will bring you to the Debian Policy Manual, which defines quite well how such version numbers are to be compared, as well as Debian::Dpkg::Version.

Re: File Parsing
by MidLifeXis (Monsignor) on Jun 21, 2014 at 21:34 UTC

    Perhaps version. Also see the -k and other sorting options in the unix sort options.

    If neither of those solve your issue, then perhaps this sort routine:

    # Assume: version strings have same numbers of parts. Adjust if not. # Note: This is not efficient, but may help explain what needs to happ +en. sort { my @a_fn_parts = split( qr([_]), $a ); my @a_ver_parts = split( qr([.]), @a_fn_parts[1] ); my @b_fn_parts = split( qr([_]), $b ); my @b_ver_parts = split( qr([.]), @b_fn_parts[1] ); my $i = 0; while ( $i < @a_ver_parts ) { return $a_ver_parts[ $i ] <=> $b_ver_parts[ $i ] if ( $a_ver_parts[ $i ] != $b_ver_parts[ $i ] ); $i++; } return 0; }

    --MidLifeXis

Re: File Parsing
by Laurent_R (Canon) on Jun 22, 2014 at 08:57 UTC
    Is this a Perl question? I am asking because you seem to be talking about $1, $2, ... as being the result of a split operation, but this is not the case in Perl, where $1, $2, ... occur in a totally different context (regex matches). S1, $2, etc. appear in the split context in other languages such as awk, but I doubt awk is the right tool to sort version numbers.

    If you intend to do it in Perl, then you probably want to create a data structure where each node contains the name of the original file on one hand and the various components of the name on the other hand. Then you can sort on the various parts and store the sorted order into an array which you can then use to figure out what you want to keep live and what you want to set aside.

    Step one: splitting the names. Maybe something like this:

    my @to_be_sorted; foreach my $filename (@filelist) { my ($root, $version) = $filename =~ /([a-z]+)_(\d+\.\d+\.\d+)/; my ($major, $minor, $third) = split /\./, $version; push @to_be_sorted, [$filename, $root, $major, $minor, $third]; }
    It could be done with shorter code, but I preferred to break the process into small parts for better comprehension. Now the records in the @to_be_sorted array look like this:
    0 ARRAY(0x600500678) 0 'bar_123.10.0_deb' 1 'bar' 2 123 3 10 4 0
    Now you can sort on elements 1, 2, 3 and 4 of each record and store into a new sorted array element 0 of each item. Something like this (not really tested):
    my @sorted_array = map {$_->[0]} sort { $a->[1] cmp $b->[1] || $a->[2] <=> $b->[2] || $a->[3] <=> $b->[3] || $a->[4] <=> $b->[4] } @to_be_sorted;
    The whole code shown above could be reduced to a single instruction using the clever Schwartzian Transform (see also Schwartzian Transform), but I would not necessarily recommend it in this case, because the initial splitting is a bit tedious.

    Please note that I fully agree with the previous post by Anonymous Monk, I have just chosen one plausible way of sorting the version numbers, you may have to change it in accordance to the Debian version number conventions.

      >Is this a Perl question? I am asking because you seem to be talking about $1, $2, ... as being the result of a split operation, but this is not the case in Perl, where $1, $2, ... occur in a totally different context (regex matches). S1, $2, etc. appear in the split context in other languages such as awk, but I doubt awk is the right tool to sort version numbers.

      I would hope that any Perlmonk knowing awk would have know what is being meant here. I had no trouble "parsing" the intent. Just saying.

        Well, understanding the intent was not the problem, but reading the OP, I was truly wondering whether the poster really wanted to do it in Perl. Besides, assuming the OP wanted to do it in Perl, I thought it was useful to remind the OP that the split does not store its results into $1, $2, etc.

        In addition, I have taken the time to provide actual code to solve the solution, so that I would think the OP does not have to complain about my post.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1090794]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (2)
As of 2024-04-19 22:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found