File Parsing

kel has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: File Parsing by Anonymous Monk on Jun 22, 2014 at 01:36 UTC
The task of sorting just any version numbers is not well defined, because everybody does this differently. Examples: Perl v5.20 is the same as Perl 5.020. If you sort the two versions numbers "2.5" and "2.5_rc1" with Perl's `sort`, then "2.5" will sort before "2.5_rc1". More generally: Is 5.02 the same as 5.020, or even the same as 5.2? Does 0.5 come before or after 0.1, before or after 0.10, and before or after 0.5.x? The list goes on. The problem for me is not as much in finding the answers, but learning to ask the right questions..... Then may I suggest the question to be "How do I sort Debian version numbers?" This question will bring you to the Debian Policy Manual, which defines quite well how such version numbers are to be compared, as well as Debian::Dpkg::Version.	[reply] [d/l]
Re: File Parsing by MidLifeXis (Monsignor) on Jun 21, 2014 at 21:34 UTC
Perhaps version. Also see the -k and other sorting options in the unix sort options. If neither of those solve your issue, then perhaps this sort routine: `# Assume: version strings have same numbers of parts. Adjust if not. # Note: This is not efficient, but may help explain what needs to happ +en. sort { my @a_fn_parts = split( qr([_]), $a ); my @a_ver_parts = split( qr([.]), @a_fn_parts[1] ); my @b_fn_parts = split( qr([_]), $b ); my @b_ver_parts = split( qr([.]), @b_fn_parts[1] ); my $i = 0; while ( $i < @a_ver_parts ) { return $a_ver_parts[ $i ] <=> $b_ver_parts[ $i ] if ( $a_ver_parts[ $i ] != $b_ver_parts[ $i ] ); $i++; } return 0; }` [download] --MidLifeXis	[reply] [d/l]
Re: File Parsing by Laurent_R (Canon) on Jun 22, 2014 at 08:57 UTC
Is this a Perl question? I am asking because you seem to be talking about $1, $2, ... as being the result of a split operation, but this is not the case in Perl, where $1, $2, ... occur in a totally different context (regex matches). S1, $2, etc. appear in the split context in other languages such as awk, but I doubt awk is the right tool to sort version numbers. If you intend to do it in Perl, then you probably want to create a data structure where each node contains the name of the original file on one hand and the various components of the name on the other hand. Then you can sort on the various parts and store the sorted order into an array which you can then use to figure out what you want to keep live and what you want to set aside. Step one: splitting the names. Maybe something like this: `my @to_be_sorted; foreach my $filename (@filelist) { my ($root, $version) = $filename =~ /([a-z]+)_(\d+\.\d+\.\d+)/; my ($major, $minor, $third) = split /\./, $version; push @to_be_sorted, [$filename, $root, $major, $minor, $third]; }` [download] It could be done with shorter code, but I preferred to break the process into small parts for better comprehension. Now the records in the @to_be_sorted array look like this: `0 ARRAY(0x600500678) 0 'bar_123.10.0_deb' 1 'bar' 2 123 3 10 4 0` [download] Now you can sort on elements 1, 2, 3 and 4 of each record and store into a new sorted array element 0 of each item. Something like this (not really tested): `my @sorted_array = map {$_->[0]} sort { $a->[1] cmp $b->[1] \|\| $a->[2] <=> $b->[2] \|\| $a->[3] <=> $b->[3] \|\| $a->[4] <=> $b->[4] } @to_be_sorted;` [download] The whole code shown above could be reduced to a single instruction using the clever Schwartzian Transform (see also Schwartzian Transform), but I would not necessarily recommend it in this case, because the initial splitting is a bit tedious. Please note that I fully agree with the previous post by Anonymous Monk, I have just chosen one plausible way of sorting the version numbers, you may have to change it in accordance to the Debian version number conventions.	[reply] [d/l] [select]
Re^2: File Parsing by perlfan (Vicar) on Jun 22, 2014 at 20:23 UTC
>Is this a Perl question? I am asking because you seem to be talking about $1, $2, ... as being the result of a split operation, but this is not the case in Perl, where $1, $2, ... occur in a totally different context (regex matches). S1, $2, etc. appear in the split context in other languages such as awk, but I doubt awk is the right tool to sort version numbers. I would hope that any Perlmonk knowing awk would have know what is being meant here. I had no trouble "parsing" the intent. Just saying.	[reply]
Re^3: File Parsing by Laurent_R (Canon) on Jun 23, 2014 at 06:36 UTC
Well, understanding the intent was not the problem, but reading the OP, I was truly wondering whether the poster really wanted to do it in Perl. Besides, assuming the OP wanted to do it in Perl, I thought it was useful to remind the OP that the split does not store its results into $1, $2, etc. In addition, I have taken the time to provide actual code to solve the solution, so that I would think the OP does not have to complain about my post.	[reply]
Re^4: File Parsing by kel (Sexton) on Jun 24, 2014 at 09:42 UTC


Your skill will accomplish what the force of many cannot
	PerlMonks