Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: Checking for Duplicates

by rchiav (Deacon)
on Feb 25, 2004 at 17:59 UTC ( #331749=note: print w/replies, xml ) Need Help??

in reply to Checking for Duplicates

You can use a hash to keep track of how many times you've seen each occurance and base the name on that. I'm assuming that you want to create unique file names based on the first token on each line? If so, here's a snippet that will do that. It's not your implementation but it should be self explanatory enough for you to adapt it.
#!/usr/bin/perl use strict; use warnings; my %index; my @data = qw/1 2 3 4 3 4 5 1 11 11 11 11 11 3/; my $filename; for (@data) { if ($index{$_}) { $filename = $_ . $index{$_}; $index{$_}++; } else { $filename = $_; $index{$_} = 'a'; } print "Your filename is $filename.\n"; }
The output was..
Your filename is 1. Your filename is 2. Your filename is 3. Your filename is 4. Your filename is 3a. Your filename is 4a. Your filename is 5. Your filename is 1a. Your filename is 11. Your filename is 11a. Your filename is 11b. Your filename is 11c. Your filename is 11d. Your filename is 3b.
You'll run into issues if you have more occurances of one token than there are letters in the alphabet though.

Replies are listed 'Best First'.
Re: Re: Checking for Duplicates
by CountZero (Bishop) on Feb 25, 2004 at 20:56 UTC
    You'll run into issues if you have more occurances of one token than there are letters in the alphabet though.

    Not at all! It will nicely continue with 'aa' after 'z', and 'ba' after 'az', etc, etc ....


    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

      Sorry, I should have been more clear about that. I didn't mean "issues" in the sense that it would break, but that it wouldn't be as straight forward as having the files being alpha ordered according to where they were found in the file. For instance, 123ab would be ordered before 123d but would have occured 24 times after 123d.

      So if you wanted things files to be alpha ordered by the order of occurance, and you were going to have a significant amount of duplicates (or rather a chance of haiving more than 26 duplicates), then you'd probably want to start with 'aa' as your base.

      Thanks for pointing that out CountZero.

Re: Re: Checking for Duplicates
by Anonymous Monk on Feb 25, 2004 at 19:00 UTC
    Thanks for your help. It works like a charm!

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://331749]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (7)
As of 2021-08-05 07:41 GMT
Find Nodes?
    Voting Booth?
    My primary motivation for participating at PerlMonks is: (Choices in context)

    Results (44 votes). Check out past polls.