Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

extract uniques and sort

by ybiC (Prior)
on Sep 20, 2000 at 23:03 UTC ( #33376=perlquestion: print w/ replies, xml ) Need Help??
ybiC has asked for the wisdom of the Perl Monks concerning the following question:

I have a file containing perhaps one hundred unqualified hostnames, one per line.   How can I eliminate duplicates (or extract uniques) then sort alphabetically?   Perfunc:sort looks good for the sorting part, but the duplicates/uniques has me at a loss.

A super search for either word turns up many hits, but nothing I caught that hits the spot.   I don't have access to my O'Reilly books at the moment, otherwise would look to the Llama or Ram first.
    thanks in advance,
    ybiC

Comment on extract uniques and sort
Re: extract uniques and sort
by japhy (Canon) on Sep 20, 2000 at 23:06 UTC
    % perldoc -tq unique
    Found in /usr/local/lib/perl5/5.00502/pod/perlfaq4.pod How can I extract just the unique elements of an array? There are several possible ways, depending on whether the array is ordered and whether you wish to preserve the ordering. a) If @in is sorted, and you want @out to be sorted: (this assumes all true values in the array) $prev = 'nonesuch'; @out = grep($_ ne $prev && ($prev = $_), @in); This is nice in that it doesn't use much extra memory, simulating uniq(1)'s behavior of removing only adjacent duplicates. It's less nice in that it won't work with false values like undef, 0, or ""; "0 but true" is ok, though. b) If you don't know whether @in is sorted: undef %saw; @out = grep(!$saw{$_}++, @in);
    There's more, but I won't paste it all. Look at the Perl documentation, perlfaq4. This is available in many places, such as http://www.perldoc.com/.

    $_="goto+F.print+chop;\n=yhpaj";F1:eval
Re: extract uniques and sort
by Fastolfe (Vicar) on Sep 20, 2000 at 23:14 UTC
    If you're worried about unique *strings*, you can just try placing each item into a %hash key. When you're all finished, a keys %hash will net you all non-duplicated items. This act of finding unique elements from a list (which this can all be reduced to) is well documented on the site and in various Perl books.

    If you're worried about catching duplicate *hostnames* (e.g. "host" v. "host.example.com" could be the same hostname if your domain is example.com), you'll have to try resolving each name individually, and perhaps storing the resulting IP address in the hash instead:

    use Socket; while (my $item = <FILE>) { if (my $ip = gethostbyname($item)) { $hash{$ip} = gethostbyaddr($ip, AF_INET) || $item; } else { # no such hostname } } print "Unique items:\n"; print "$hash{$_}\n" for keys %hash;
    Note that $ip is in packed form here.
      if the hosts are all internal ( e.g. within a company intranet ), you can use Fastolfe's trick if you add the domain name to the hostname first

       $host .= '.foo.com' unless $host =~ /.foo.com$/;

      admittedly lazy, but i do use the trick here at work all the time. then, the problem of host vs host.foo.com (which are the same) is resolved.

      the IP resolution trick might not work for hosts w/ more than one IP address ( routers, for one )

        That isn't really necessary. So long as your domain is properly configured in your network subsystem (e.g. in /etc/resolv.conf), attempting to resolve "www" is equivalent to resolving "www.example.com".

        If you want to skip the step of actually attempting to resolve the hostnames, you can try to parse /etc/resolv.conf yourself and tack on each of the 'search' or 'domain' items (not sure what all is used), since you could technically have more than one. But if it's a custom app just for 1 location/purpose, I guess you could hard-code it without worry.

      Thanks Fastolfe, for an informative reply, also to japhy, geektron, and I think tilly in the Chattbox.

      Actually, I'm looking for unique strings that happen to be hostnames.   The list is built using a variation of (code)) Cisco Pass Mass - IOS (deprecated by node 123464) to query a core switch for CDP neighbors (wiring closet "top o' stack" switches).   Next I feed the output back into the same script to find remaining switches in each wiring closet stack.   Then, I'll feed the complete list of core+closet switches back into the original (code)) Cisco Pass Mass - IOS (deprecated by node 123464) script to implement my config updates.

      I've now got my hands on the Perl Cookbook, pouring over example 4.6.   Looks like I'm getting closer to a solution.   Keys of Hash, eh?
          cheers,
          ybiC

(code) RE: extract uniques and sort (thanks. 1WTDI)
by ybiC (Prior) on Sep 22, 2000 at 22:53 UTC
    Here's what I ended up with for extracting uniques and sorting.   I know it can be further simplified, but my non-programmer brain can somewhat grok it as is.   Thanks again, all!
        cheers,
        ybiC
    #!/usr/bin/perl -w use strict; my $infile = shift; my $outfile = shift; my %seen; open (IN, "<$infile") or die "Can't open $infile RO: $!"; open (OUT, ">$outfile") or die "Can't open $outfile RW: $!"; foreach my $item (<IN>) { # parse $infile into a hash and remove + duplicate keys ++$seen{$item}; } print OUT sort keys %seen; # sort keys and save to $outfile close IN or die "Cannot close $infile: $!"; close OUT or die "Cannot close $outfile: $!";
      You know, if this is all that your script does, you may want to consider using standard Unix tools sort and uniq:
      $ cat unsorted | sort | uniq >sorted.unique
      It will probably be a bit faster.
        "consider std Unix sort and uniq..."

        Thanks for the suggestion Fastolfe.   I considered that but the snippet is the tail end of a (still modest) 95 line script using Net::Telnet::Cisco to query LAN switches for CDP neighbors.   Number of switches is under 500 at my current company, so performance isn't really an issue.

        Could still be a viable option, but I want do what as much as I can within the script itself, and without using system calls.
            cheers,
            ybiC

Re: extract uniques and sort
by scottstef (Curate) on Jun 15, 2001 at 04:12 UTC

    ybiC,

    I think you want to look at an if exists statement for your hash. It will look at a primary key and return true if it exists. As for the sort, you are on your own for that, I haven't needed to learn that function yet. %^) Update: Posted this code per ybiC's request. Here is some code I pieced together that compared two files into a hash.


    #!/usr/bin/perl -w use strict; use diagnostics; my $file1 = "/home/scott/PerlHacks/post.office/file1"; my $file2 = "/home/scott/PerlHacks/post.office/file2"; my $hashOfLists; my %hashOfLists; open (FILE1, "$file1") or die "Could not open $file1 $!"; open (FILE2, "$file2") or die "Could not open $file2 $!"; ###################################################### #####This block reads all of the items of the file into #####a hash with the item being the key. Since I ##### was comparing 2 files each key received a #####value of 1 if it was in the first file, 2 if #####it was in both , and 3 if it was only in file 2 ###################################################### foreach (<FILE1>) { chomp $_; $hashOfLists{$_} =1; } foreach (<FILE2>) { chomp $_; ###################################################### #####if exists checks the hash to see if $_ exists. ##### If true $_'s value becomes 2, #####if false, $_ gets a value of three ############################## +######################## if (exists $hashOfLists{$_}) { ($hashOfLists{$_}) =2; } else { ($hashOfLists{$_})=3; } } close FILE1 or die "Could not close $file1 $!"; close FILE2 or die "Could not close $file2 $!";

    "The social dynamics of the net are a direct consequence of the fact that nobody has yet developed a Remote Strangulation Protocol." -- Larry Wall



    Edited by planetscape - added code tags

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://33376]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2014-07-13 19:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (251 votes), past polls