PyrexKidd has asked for the wisdom of the Perl Monks concerning the following question:
I have a massive CSV file that I am trying to process to remove multiple entries.
Really I would like to have two files when I am done, the new dup free list and a list of removed entries.
The csv file is in this format:
Group One,Captain,Phone Number,League Pos,etc.
Group-One,Captain,Phone Number,League Pos,etc.
GroupOne,Captain,Phone Number,League Pos,etc.
Group Two,Captain,Phone Number,League Pos,etc.
Group Three,Captain,Phone Number,League Pos,etc.
Etc. Etc. Etc.
my thinking is to pull out the company name from each line then compare it to the rest of the file. I'm still new to regex and I'm running into multiple problems:
First, there has to be a better way than iterating through the file for each line of the file.
Second, I'm not sure what I changed, but now I am receiving these two errors:
And finally, The dup entries with special characters are not counted as dups... Would doing a s/r for every special character ie:
Thanks in advance for the assist.
Really I would like to have two files when I am done, the new dup free list and a list of removed entries.
The csv file is in this format:
Group One,Captain,Phone Number,League Pos,etc.
Group-One,Captain,Phone Number,League Pos,etc.
GroupOne,Captain,Phone Number,League Pos,etc.
Group Two,Captain,Phone Number,League Pos,etc.
Group Three,Captain,Phone Number,League Pos,etc.
Etc. Etc. Etc.
my thinking is to pull out the company name from each line then compare it to the rest of the file. I'm still new to regex and I'm running into multiple problems:
#!/usr/bin/perl use strict; use warnings; open my $FHIN, '<', $ARGV[0]; open my $FHOUT, '>', "$ARGV[0].new"; open my $DELLIST, '>', "ARGV[0].deleted"; foreach (<$FHIN>){ $_ =~ s/\"//g; $_ =~ m/(.*?),/i; my $tmp = $1; open my $SCRATCH, '<', "./scratch.pad"; open my $TMPOUT, '>', "./tmp.out"; foreach (<$SCRATCH>){ $_ = m/$tmp/ ? print $DELLIST $_ : print $TMPOUT $_; } close $SCRATCH, $TMPOUT; cp ($TMPOUT, $SCRATCH); }
First, there has to be a better way than iterating through the file for each line of the file.
Second, I'm not sure what I changed, but now I am receiving these two errors:
Filehandle $TMPOUT opened only for output at /usr/share/perl/5.10/File +/Copy.pm line 200. stat() on closed filehandle $SCRATCH at /usr/share/perl/5.10/File/Copy +.pm line 117.
And finally, The dup entries with special characters are not counted as dups... Would doing a s/r for every special character ie:
work to make all the entries similar enough?$_ =~ s/[-|\&|_|+|']/ /g;
Thanks in advance for the assist.
EDIT
in Lieu of:How about:$_ =~ s/[-|\&|_|+|']/ /g;
<c> $_ =~ s/\W/ /g;
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Remove duplicate entries
by ikegami (Patriarch) on Nov 16, 2010 at 23:37 UTC | |
Re: Remove duplicate entries
by kcott (Archbishop) on Nov 17, 2010 at 01:08 UTC | |
Re: Remove duplicate entries
by aquarium (Curate) on Nov 16, 2010 at 23:47 UTC | |
by aquarium (Curate) on Nov 17, 2010 at 02:52 UTC | |
Re: Remove duplicate entries
by 7stud (Deacon) on Nov 17, 2010 at 02:06 UTC | |
Re: Remove duplicate entries
by PyrexKidd (Monk) on Nov 17, 2010 at 06:55 UTC | |
by kcott (Archbishop) on Nov 17, 2010 at 07:52 UTC | |
by PyrexKidd (Monk) on Nov 17, 2010 at 16:06 UTC | |
by kcott (Archbishop) on Nov 17, 2010 at 17:11 UTC | |
Re: Remove duplicate entries
by chrestomanci (Priest) on Nov 17, 2010 at 09:50 UTC |
Back to
Seekers of Perl Wisdom