Sorting records on a single field

TStanley has asked for the wisdom of the Perl Monks concerning the following question:

I have the following data file, which has been extracted from a log file in Tuxedo:

100644:MWTP_CAT:12002: SERVER:pid=14520:Execution time:TPR015-10:(mill
+isec):53
100644:MWTP_CAT:12002: SERVER:pid=15866:Execution time:TPR015-10:(mill
+isec):10
100644:MWTP_CAT:12002: SERVER:pid=15866:Execution time:TPR015-10:(mill
+isec):33
100644:MWTP_CAT:12002: SERVER:pid=16565:Execution time:TPR007-12:(mill
+isec):437
100644:MWTP_CAT:12002: SERVER:pid=16565:Execution time:TPR007-12:(mill
+isec):470
100644:MWTP_CAT:12002: SERVER:pid=16048:Execution time:TPR009-30:(mill
+isec):24
100644:MWTP_CAT:12002: SERVER:pid=15866:Execution time:TPR012-01E:(mil
+lisec):63
100644:MWTP_CAT:12002: SERVER:pid=10427:Execution time:ISCST044:(milli
+sec):0
100644:MWTP_CAT:12002: SERVER:pid=15866:Execution time:TPR012-01E:(mil
+lisec):85
100644:MWTP_CAT:12002: SERVER:pid=10428:Execution time:01201E:(millise
+c):3
[download]

I need to sort this data by the number of milliseconds (execution time), so the above would look like:

100644:MWTP_CAT:12002: SERVER:pid=16565:Execution time:TPR007-12:(mill
+isec):470
100644:MWTP_CAT:12002: SERVER:pid=16565:Execution time:TPR007-12:(mill
+isec):437
100644:MWTP_CAT:12002: SERVER:pid=15866:Execution time:TPR012-01E:(mil
+lisec):85
100644:MWTP_CAT:12002: SERVER:pid=15866:Execution time:TPR012-01E:(mil
+lisec):63
100644:MWTP_CAT:12002: SERVER:pid=14520:Execution time:TPR015-10:(mill
+isec):53
100644:MWTP_CAT:12002: SERVER:pid=15866:Execution time:TPR015-10:(mill
+isec):33
100644:MWTP_CAT:12002: SERVER:pid=16048:Execution time:TPR009-30:(mill
+isec):24
100644:MWTP_CAT:12002: SERVER:pid=15866:Execution time:TPR015-10:(mill
+isec):10
100644:MWTP_CAT:12002: SERVER:pid=10428:Execution time:01201E:(millise
+c):3
100644:MWTP_CAT:12002: SERVER:pid=10427:Execution time:ISCST044:(milli
+sec):0
[download]

I know that the sort feature I need specifically would be sort { $a cmp $b } but I am unsure as to how I would extract and sort. As always, just give pointers in the right direction. Thanks.

TStanley
--------
People sleep peaceably in their beds at night only because rough men stand ready to do violence on their behalf. -- George Orwell

Comment on Sorting records on a single field Select or Download Code

Replies are listed 'Best First'.
Re: Sorting records on a single field by jwkrahn (Abbot) on Jan 20, 2010 at 17:21 UTC
`print for map $_->[1], sort { $b->[0] <=> $a->[0] } map [ /$millisec$:(\d+)/, $_ ], @data;` [download]	[reply] [d/l]
Re^2: Sorting records on a single field by kikuchiyo (Hermit) on Jan 20, 2010 at 17:39 UTC
If the number of interest is always at the end of the lines then a simple `/(\d+)$/` would do. A combination of `rindex` and `substr` instead of the regex would probably be even faster. See also this reference work for more pointers.	[reply] [d/l] [select]
Re: Sorting records on a single field by almut (Canon) on Jan 20, 2010 at 17:25 UTC
As always, just give pointers in the right direction. Ok, here are your pointers :) Use a regex or split to extract the column of interest Use the Schwartzian Transform to do the actual sorting. The ST avoids having to do the (relatively expensive) extraction procedure anew for each pairwise comparison (`$a <=> $b`).	[reply] [d/l]
Re: Sorting records on a single field by ack (Deacon) on Jan 20, 2010 at 17:44 UTC
There are several good references in the Tutorials section of the Monestary on sorting. I would look in the subsection Getting Deeper Into Perl and the sub-subsection List Processing, Filtering, and Sorting. In particular you should look at transformational sorts; the Schwartzian Sort is, I think, one of the more popular that should meet your needs. I would suggest, in particular, any one of three of the tutorials: A brief tutorial on Perl's native sorting facilities by BrowserUK, Resorting to Sorting by japhy, or Complex sorting by vroom The first is a good place to start, but the other two are really good, to, IMHO. Good luck. UPDATE: One thing I should've said (I didn't think about this until I got home last night) is that when you write your sort subroutine (which is what gives the transformation sorting...like the Schartzian Transformation sort, etc., their power...you'll need to parse each line of the input file to isolate the time data that you want to sort on. This will also mean that you'd need to read (slurp) the entire file into an array since the types of sorting mentioned in the references I posed do so in memory. There are some CPAN modules (e.g., Sort::Array) that I believe can sort files without having to slurp the entire file into memory...but I don't have any experience with them so I'm not sure if they can actually do that...maybe other Monks could guide you on that). Again, good luck. ack Albuquerque, NM	[reply]
Re^2: Sorting records on a single field by almut (Canon) on Jan 20, 2010 at 18:05 UTC
It's probably referenced somewhere within the pointers already given, but just in case... here is another classical one worthwhile reading: A Fresh Look at Efficient Perl Sorting.	[reply]
Re^3: Sorting records on a single field by ack (Deacon) on Jan 21, 2010 at 18:18 UTC
I just checked out that reference it is, indeed, an excellent reference. I did not know about it and found it a good reference to put my tool back of "places to look" for info on sorting. Thanks, almut. ack Albuquerque, NM	[reply]
Re: Sorting records on a single field by Anonymous Monk on Jan 20, 2010 at 18:15 UTC
If you're just looking for a quick and dirty way to do this, you can do it in your shell. `sort -rnt: -k9` Sort reverse, numeric, field separator colon, field 9.	[reply] [d/l]
Re: Sorting records on a single field by planetscape (Chancellor) on Jan 21, 2010 at 00:11 UTC
You may find these helpful: Sort by specific column schwartzian transform and sorting on two columns How do I sort a file with IP Addresses and another text column? How do I sort on more than one column? HTH, planetscape	[reply]
Re: Sorting records on a single field by Lain78 (Initiate) on Jan 21, 2010 at 10:03 UTC
Hi! I don't have extensive experience with sorting methods, but in this case I think a simple approach would work, like that: use strict; use warnings; my $Line; # one input line my @SortedData; # resulting sorted data set # data set example my @Data = ( '100644:MWTP_CAT:12002: SERVER:pid=14520:Execution + time:TPR015-10:(millisec):53', '100644:MWTP_CAT:12002: SERVER:pid=15866:Execution + time:TPR015-10:(millisec):10', '100644:MWTP_CAT:12002: SERVER:pid=15866:Execution + time:TPR015-10:(millisec):33', '100644:MWTP_CAT:12002: SERVER:pid=16565:Execution + time:TPR007-12:(millisec):437', '100644:MWTP_CAT:12002: SERVER:pid=16565:Execution + time:TPR007-12:(millisec):470', '100644:MWTP_CAT:12002: SERVER:pid=16048:Execution + time:TPR009-30:(millisec):24', '100644:MWTP_CAT:12002: SERVER:pid=15866:Execution + time:TPR012-01E:(millisec):63', '100644:MWTP_CAT:12002: SERVER:pid=10427:Execution + time:ISCST044:(millisec):0', '100644:MWTP_CAT:12002: SERVER:pid=15866:Execution + time:TPR012-01E:(millisec):85', '100644:MWTP_CAT:12002: SERVER:pid=10428:Execution + time:01201E:(millisec):3', ); # create sorted data set @SortedData = reverse sort { (split (/:/, $a))[-1] <=> (split (/:/, $b +))[-1] } @Data; ### DEBUG: print input and output sets ### print "Data Set is:\n", join ("\n", @Data), "\n"; print "Sorted Data is:\n", join ("\n", @SortedData), "\n"; [download]	[reply] [d/l]
Re^2: Sorting records on a single field by salva (Canon) on Jan 21, 2010 at 10:40 UTC
You can get rid of the `reverse` operation just inverting the order of the comparison operands. In other words, instead of `reverse sort { $a <=> $b } @data` use `sort { $b <=> $a } @data`. In OP case: `@SortedData = sort { (split (/:/, $b))[-1] <=> (split (/:/, $a))[-1] } + @Data;` [download] Using `reverse` also makes the sort operation unstable (entries with equal sorting keys do not keep their relative positions after the sort operation).	[reply] [d/l] [select]
Re^3: Sorting records on a single field by Lain78 (Initiate) on Jan 21, 2010 at 11:07 UTC
I see that switching the opereands is better but I don't catch what you mean with "equal sorting keys do not keep their relative positions after the sort operation"... probably I miss something. Could you explain deeply that point? Thanks.	[reply]
Re^4: Sorting records on a single field by salva (Canon) on Jan 21, 2010 at 11:28 UTC

Back to Seekers of Perl Wisdom