Next, your subroutine does a *lot* of manipulation, and since it will be called multiple times for every string in the collection, you could be doing a lot more work than necessary. For small lists, that's no big deal. But it can add up quickly. There's a technique called the Schwartzian transform that can greatly reduce data manipulation during sorting.
For a quicksort, each time you double the size of your list, you have to transform *each* item on your list another time. It can grow fairly quickly:
List size | Times each item is manipulated | Total manipulations |
2 | 1 | 2 |
4 | 2 | 4*2=8 |
8 | 3 | 8*3=24 |
16 | 4 | 16*4=64 |
. . . | . . . | . . . |
1024 | 10 | 1024*10=10,240 |
. . . | . . . | . . . |
1048576 | 20 | 1048576*20=20,971,520 |
N | log(N)/log(2) | N*log(N)/log(2) |
In the Schwartzian transform, you simply build a *new* list that's much easier to sort, transforming all your data items once. Then you sort the new list. After the sort, you pull out the original data items to get your sorted list.
Suppose that you had a large list of words in mixed case, and wanted to sort them alphabetically, irrespective of case. But you want your resulting list to retain the original case of the words. You could do it like:
my @list = ( qw( apple Alpha aLBAtross aLbAcOre etc... ) );
my @result = sort {uc($a) cmp uc($b)} @original;
but it would uppercase all items repeatedly. If your list was large enough that could slow your application down. But suppose your list was more like:
my @list = (
[qw( APPLE apple )],
[qw( ALPHA alpha )],
[qw( ALBATROSS aLBAtross )],
[qw( ALBACORE aLbAcOrE)],
etc...
);
That would be much easier to sort: Just sort the array in the order of the first column:
my @result = sort { $a->[0] cmp $b->[0] } @list;
And then your list would look like this:
my @list = (
[qw( ALBACORE aLbAcOrE)],
[qw( ALBATROSS aLBAtross )],
[qw( ALPHA alpha )],
[qw( APPLE apple )],
etc...
);
Retrieving the sorted data is simple enough: It's the second column of the array. Reading a discussion of the Schwartzian transform was one of the things that got me hooked on perl. (In fact, it was this article: Resorting to Sorting. It led me to realize how flexible perl can be. It can make sort work of simple list operations. For example, you can use the map function (perldoc -f map) to take your original list and reshape it. So to do our sort we could do it like this:
my @list = ( qw( apple Alpha aLBAtross aLbAcOre etc... ) );
# Convert our list into the ( [APPLE apple], ... ) form with map:
@result = map { [ uc($_), $_ ] } @list;
# Now sort the list:
@result = sort {$a->[0] cmp $b->[0]} @result;
# Now convert the resulting list back into (aLbAcOrE aLBAtross...) for
+mat:
@result = map { $_->[1] } @result;
This is kind of wordy, though. Perl lets us pipeline the operations, though, so we can do it in a single statement:
my @list = ( qw( apple Alpha aLBAtross aLbAcOre etc... ) );
my @result = map {$_->[1] } # .sgnirts desacreppu eht
+ lla tou pirts #
sort {$a->[0] cmp $b->[0]} # neht dna ,sgnirts desacrep
+pu eht no tros #
map { [ uc($_), $_ ] } # ,gnirts eht fo noisrev desa
+creppu eht dda #
@list; #
+ ,tsil a neviG #
It takes a little getting used to, but just remember that the list operations (sort, map, grep) all take a list on the right side as input, and output a new list on the left side. So when you chain them together, just remember to read them from right to left. ;^D
If you wanted to, you could even throw a grep function in there to only retain words fulfilling specific criteria. You might only want words longer than four characters for example.