Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

sorting domains by extention

by Anonymous Monk
on Aug 27, 2002 at 10:01 UTC ( [id://193114]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks, your wisdom is required for I am severely lacking any on this Tuesday morning.
Simplest way to sort an array of domain names by their extentions?

Incorrect method : (simplified)
my @regexps=('\.co\.uk$','[\w\-]+\.com$','\.pl$','\.uk\.com$'); my @domains=qw(foo.com weirdext.za bar.uk.com blah.co.uk perl.pl zzzz. +co.uk); while(<@regexps>) { while(<@domains>) { if (/$regexp/i) { ......

Ideally I end up with a list of domains in extention order, and any that dont match get tagged on the end, in this case:

blah.co.uk zzzz.co.uk foo.com bar.uk.com perl.pl weirdext.za

Theory 1.003 alpha was to reverse each scalar and sort, but (for ex).coms and .uk.coms would spoil this.

Many thanks.
Paul Faulkner

Replies are listed 'Best First'.
Re: sorting domains by extention
by demerphq (Chancellor) on Aug 27, 2002 at 10:37 UTC
    Use a Schwartzian Transform (ST) or Guttman Rosler Transform (GRT)

    ST:

    my @domains=qw( blah.co.uk zzzz.co.uk foo.com bar.uk.com perl.pl weirdext.za google.de google.ca google.ru ); # ST : my @list=map { shift @$_ } sort { $a->[1] cmp $b->[1] || $a->[0] cmp $b->[0]} map { [ $_, m/(\..*)$/ ] } @domains; print join("\n",@list),"\n\n"; # GRT : (My preference for a variety of reasons, notably speed) @list=map {substr($_,index($_,"\0")+1)} sort map {join ("\0",m/(\..*)$/,$_) } @domains; print join("\n",@list),"\n\n"; # ST : With extra ordering criterion my %legal=map{ $_ => 1} qw(.co.uk .foo .com .edu); @list=map { shift @$_ } sort { $b->[2] <=> $a->[2] || $a->[1] cmp $b->[1] || $a->[0] cmp + $b->[0]} map { my ($ext)=m/(\..*)$/; [ $_, $ext, $legal{$ext} ] } @domains; print join("\n",@list),"\n\n"; __END__ Outputs: ---------- google.ca blah.co.uk zzzz.co.uk foo.com google.de perl.pl google.ru bar.uk.com weirdext.za google.ca blah.co.uk zzzz.co.uk foo.com google.de perl.pl google.ru bar.uk.com weirdext.za blah.co.uk zzzz.co.uk foo.com google.ca google.de perl.pl google.ru bar.uk.com weirdext.za
    Note I update this node with the extracriterion and a minor typo fix.

    Yves / DeMerphq
    ---
    Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)
    This was my Pentium Post! (686)

Re: sorting domains by extention
by Abigail-II (Bishop) on Aug 27, 2002 at 11:13 UTC
    You don't need a full sort, all you want is to put the domains in the proper buckets. Here's my solution:
    #!/usr/bin/perl use strict; use warnings 'all'; my @regexes = map {qr /\.$_$/} qr {co\.uk}, qr {com}, # No need for the [\w\-] prefix. qr {pl}, # qr {uk\.com}, # This one already gets grabbed by \.com ; my @domains = qw { foo.com weirdext.za bar.uk.com blah.co.uk perl.pl zzzz.co.uk }; my @buckets = map {[]} @regexes, 1; DOMAIN: foreach my $domain (@domains) { for (my $i = 0; $i < @regexes; $i ++) { next unless $domain =~ qr /$regexes[$i]/; push @{$buckets [$i]} => $domain; next DOMAIN; } push @{$buckets [-1]} => $domain; } print "$_\n" for map {@$_} @buckets; __END__ blah.co.uk zzzz.co.uk foo.com bar.uk.com perl.pl weirdext.za
    Abigail
Re: sorting domains by extention
by BrowserUk (Patriarch) on Aug 27, 2002 at 14:20 UTC

    My contribution, cos it had to be done:^). Probably not the most efficient solution, but simple.

    #! perl -w my @domains = qw(foo.com weirdext.za bar.uk.com blah.co.uk perl.pl zzz +z.co.uk); my @sorted = map{ join '.', reverse split( /\|/,$_ ) } sort map {join '|', reverse split( /\./,$_,2) } @domains; { local $"="\n"; print "@sorted"; } __END__ # Output C:\test>193114 blah.co.uk zzzz.co.uk foo.com perl.pl bar.uk.com weirdext.za C:\test>

    I know that the result is slightly different from your 'desired output' example, but I thought about this for a long time, and whilst I'm probably wrong as noone else has mentioned it, I can see no criteria by which bar.uk.com could be sort in the position you have it?

    If its grouped with foo.com, because they both have a .com extension, then bar.uk sorts before foo.

    If its after foo.com because .uk.com is lexically higher that .com, then .uk.com is also higher than .pl, which is what I think that you are asking for.


    What's this about a "crooked mitre"? I'm good at woodwork!
      Just thought I'd mention that this is essentially a form of GRT.

      ++

      Yves / DeMerphq
      ---
      Software Engineering is Programming when you can't. -- E. W. Dijkstra (RIP)

Re: sorting domains by extention
by Aristotle (Chancellor) on Aug 27, 2002 at 11:04 UTC
    In this specific case, you don't need a fullblown GRT. Of course the simpler approach is less efficient, but you won't notice that before you start sorting tenthousands of domains, and I find the simpler approach is tons more readable.
    my @sorted_domain = map { join ".", reverse split /\./ } sort map { join ".", reverse split /\./ } @domain;
    Argh. I'm not paying attention.

    Makeshifts last the longest.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://193114]
Approved by Aristotle
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-20 02:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found