Checking for Duplicates

skyler has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I've created a script that reads a file and parses the fields into a pipe(|) delimited, then puts it back to another file. I'm doing it becuase of the last line segment within the file is a path which gets transform as the file is read line by line. (C:\(directory1)\(directory2)\(directory3)) My question is: I would like to take the first line segment put it into an array or hash and check to see if it repeats while reading the file, If it does, then rename it to itself plus append a letter to it (a,b,c..so forth) as instances of the segment repeats. For example 56789a,56789b,56789c...so forth). Do you have any hints?? Here is the code so far.

#! perl -w

use strict;
use File::Copy;

my $cnt;

my $infile = "C:\\(directory1)\\file1.chr";

my ( $yr, $mo, $dy ) = (localtime)[5,4,3];
my $outfile = sprintf( "C:\\(Directory1)\\%04d%02d%02d.txt",$yr+1900,$
+mo+1,$dy );

my $staticdir = "C:\\(directory1)\\(directory2)\\";

open IN, "<$infile" or die "Couldn't open $infile, $!";
open OUT,">$outfile" or die "Couldn't open $outfile, $!";

 $cnt++;

while (<IN>) {
    chomp;
    my @fields   = split /\|/;

    my $newfile = $fields[0];

    my $path_str = $fields[20];

    do { warn "Empty field 19"; next } unless $path_str;
    my @path = split /\\/, $path_str;
    my $dir = join "\\", @path[ 0, 1, 2, 3, 4, 5, 6 ];

    $newfile =~ s/$/.rtf/;

    my $out = join ('|', @fields[0..19]) . "@@" . $staticdir . $newfil
+e;

    print OUT "$out\n"; 
    
    Print "$cnt\n";
[download]

Comment on Checking for Duplicates Download Code

Replies are listed 'Best First'.
Re: Checking for Duplicates by rchiav (Deacon) on Feb 25, 2004 at 17:59 UTC
You can use a hash to keep track of how many times you've seen each occurance and base the name on that. I'm assuming that you want to create unique file names based on the first token on each line? If so, here's a snippet that will do that. It's not your implementation but it should be self explanatory enough for you to adapt it. `#!/usr/bin/perl use strict; use warnings; my %index; my @data = qw/1 2 3 4 3 4 5 1 11 11 11 11 11 3/; my $filename; for (@data) { if ($index{$_}) { $filename = $_ . $index{$_}; $index{$_}++; } else { $filename = $_; $index{$_} = 'a'; } print "Your filename is $filename.\n"; }` [download] The output was.. `Your filename is 1. Your filename is 2. Your filename is 3. Your filename is 4. Your filename is 3a. Your filename is 4a. Your filename is 5. Your filename is 1a. Your filename is 11. Your filename is 11a. Your filename is 11b. Your filename is 11c. Your filename is 11d. Your filename is 3b.` [download] You'll run into issues if you have more occurances of one token than there are letters in the alphabet though.	[reply] [d/l] [select]
Re: Re: Checking for Duplicates by CountZero (Bishop) on Feb 25, 2004 at 20:56 UTC
You'll run into issues if you have more occurances of one token than there are letters in the alphabet though. Not at all! It will nicely continue with 'aa' after 'z', and 'ba' after 'az', etc, etc .... CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law	[reply]
Re: Re: Re: Checking for Duplicates by rchiav (Deacon) on Feb 25, 2004 at 21:05 UTC
Sorry, I should have been more clear about that. I didn't mean "issues" in the sense that it would break, but that it wouldn't be as straight forward as having the files being alpha ordered according to where they were found in the file. For instance, 123ab would be ordered before 123d but would have occured 24 times after 123d. So if you wanted things files to be alpha ordered by the order of occurance, and you were going to have a significant amount of duplicates (or rather a chance of haiving more than 26 duplicates), then you'd probably want to start with 'aa' as your base. Thanks for pointing that out CountZero.	[reply]
Re: Re: Checking for Duplicates by Anonymous Monk on Feb 25, 2004 at 19:00 UTC
Thanks for your help. It works like a charm!	[reply]

Back to Seekers of Perl Wisdom