Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

This may be disjointed, I am a little tired after the reading and all the other things I did today. With what I have so far, I have 47 instances of my_sort to alter.

Thank you roboticus for you really long explanation of where I went wrong. I made most of the changes you recommended with a few exceptions.

  • I did not use my_sort_index since I do not know what you had in mind for it. However, I want it in both sort functions since I have directories with files I want sorted by name with their parent index.(pl|shtml|html) file in the mix.
  • I can not figure out how to get the Schwartzian transform into the functions. Where do they go in my_article_sort and my_name_sort? Are they a whole other function?

davido, I looked at Unicode::Collate. I can not see how it would work with hash values. The writer only uses arrays in the examples. The way I use the sort functions now is as follows.

for my $movie (sort { my_article_sort($a->{title},$b->{title}) values +%movies} { ... } for my $alpha (sort { my_article_sort($a,$b) } keys %alpha_movies) { . +.. } for my $character (sort { my_name_sort($a->{name},$b->{name}) values % +player_characters} { ... } for my $color (sort { my_article_sort($a->{name},$b->{name}) values %c +olors} { ... } my @files = (map("$data_dir$_",grep(/txt$/,sort { my_article_sort($a,$ +b) } readdir($directory))));

I looked at both the transform links but got really lost. As stated above, I do not know how they can be added to the two sort functions. Also, I still have problems with map BLOCK (and grep BLOCK). After about six or eight tries to get a map to work, I usually end up doing a for loop to get the desired result whether it be a concatenation, some regex, or math. It has been a long time since I wrote a map, and at the time I had to have someone holding my hand while I was doing it. The map/grep above is about the best I can do at the moment.

7stud, I have begun work on putting the dispatch table into a function. I am not sure of my approach just yet. At least the functions are no longer longer than the screen height. Oh, I looked at Sort::Maker. I do not understand it one bit.

The following is where I am at the moment. (Please ignore short_sorts, it is still a mess. I hardly ever use it.)

package Base::Sorts; use strict; use warnings FATAL => qw( all ); use base 'Exporter'; our @EXPORT_OK = qw(my_article_sort my_name_sort short_sorts); use Carp qw(croak); use HTML::Entities qw(decode_entities); sub split_out_leading_number { my $s = shift; if ( $s =~/^((\d|,|\.)+)(.*)$/) { my ($leading_number, $rest) = ($1,$3); # Take any commas out of the number. $leading_number =~ s/,//g; return ($leading_number, $rest); } die "split_out_leading_number received bogus input '$s'!\n"; } sub my_article_sort { my ($c,$d) = @_; $c = lc($c); $d = lc($d); # When sorting lists of files, I want the index file to always come +first. if ($c =~ /^index\./) { return -1; } elsif ($d =~ /^index\./) { return 1; } else { # This is the default sorting method. # Written with the help of kent/n in #perl on freenode. for ($c, $d) { s/<.+?>//g; # Strip out any html tags. s/\s*\b(A|a|An|an|The|the)(_|\s)//xi; # Strip off leading articl +es (in English). decode_entities($_); } if ( $c =~/^((\d|,|\.)+)(.*)$/ && $d =~ /^((\d|,|\.)+)(.*)$/) { my ($num1, $text1) = split_out_leading_number($c); my ($num2, $text2) = split_out_leading_number($d); # First compare the numbers, then compare the remaining parts of + the string. $num1 <=> $num2 || $text1 cmp $text2 } else { $c cmp $d; } } } sub my_name_sort { my ($c,$d) = @_; $c = lc($c); $d = lc($d); # When sorting lists of files, I want the index file to always come +first. # There may be an index file in a folder of files I want sorted by n +ame. if ($c =~ /^index\./) { return -1; } elsif ($d =~ /^index\./) { return 1; } else { # When I sort by name I prefer lastname firstname. # I have not yet written this to account for Sr., Jr., or Roman nu +merals after the last name. for ($c,$d) { s/<.+?>//g; # Strip out any html tags. s/\|.+$//; $_ = join(' ', (reverse split(/(?:_|\s)(?=[^_\s]+$)/, $_,2))) if + $_ !~ /^_/; s/^_//; decode_entities($_); } return $c cmp $d; } } sub short_sorts { my ($a,$b,$type) = @_; # Legend: # s = case sensitive # i = case insensitive # a = ascending # d = descending # r = reverse (right to left) # n = numbers # l = length of value my %sorts; $sorts{$_} = sub { $_[0] cmp $_[1] } for qw(sa as); $sorts{$_} = sub { $_[1] cmp $_[0] } for qw(sd ds); $sorts{$_} = sub { uc $_[0] cmp uc $_[1] } for qw(ia ai); $sorts{$_} = sub { uc $_[1] cmp uc $_[0] } for qw(id di); $sorts{$_} = sub { $_[0] <=> $_[1] } for qw(na an); $sorts{$_} = sub { $_[1] <=> $_[0] } for qw(nd dn); $sorts{$_} = sub { reverse($_[0]) cmp reverse($_[1]) } for qw(sar sr +a asr ars rsa ras); $sorts{$_} = sub { reverse($_[1]) cmp reverse($_[0]) } for qw(sdr sr +d dsr drs rsd rds); $sorts{$_} = sub { uc reverse($_[0]) cmp uc reverse($_[1]) } for qw( +iar ira air ari ria rai); $sorts{$_} = sub { uc reverse($_[1]) cmp uc reverse($_[0]) } for qw( +idr ird dir dri rid rdi); $sorts{$_} = sub { reverse $_[0] <=> reverse $_[1] } for qw(nar nra +anr arn rna ran); $sorts{$_} = sub { reverse $_[1] <=> reverse $_[0] } for qw(ndr nrd +dnr drn rnd rdn); $sorts{$_} = sub { length($_[0]) <=> length($_[1]) } for qw(la al); $sorts{$_} = sub { length($_[1]) <=> length($_[0]) } for qw(ld dl); if ($type) { croak "$type is not supported" if !exists $sorts{$type}; return $sorts{$type}->($a,$b); } else { die "A sort type was not selected."; } }
Have a cookie and a very nice day!
Lady Aleena

In reply to Re^2: RFC: I rewrote a custom sort function, but not sure if it is the best way. by Lady_Aleena
in thread RFC: I rewrote a custom sort function, but not sure if it is the best way. by Lady_Aleena

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    [Corion]: choroba: Yeah, I think that would be the good solution
    [LanX]: I suspect the first string which comes from the DB ...
    [LanX]: ... but this part is already in production for a year now
    [Corion]: LanX: The "good" approach here would be to use the appropriate DBI parameters to make the driver decode strings properly. But that will have a ripple-on effect of messing up all the places where manual decoding happens ;)
    [LanX]: which means albeit being broken UTF8 it'll be handled correctly
    [LanX]: and the problem only occurs since we changed the emails to base64
    [LanX]: my main problem will be to cnvince my colleagues that our productive code is broken oO ... so in the end I will just make a workaround :-/
    LanX hates UTF8 for causing knots in his brain and stomach
    [Corion]: LanX: Yes, that's the main problem - you have lots (and lots) of workarounds in various places and stages of the processing, and to clean that mess up requires action across the complete codebase. And it's almost impossible to do it piece-by-piece

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (10)
    As of 2017-01-16 14:02 GMT
    Find Nodes?
      Voting Booth?
      Do you watch meteor showers?

      Results (150 votes). Check out past polls.