How to remove duplicate records

SatisfyMyStruggles has asked for the wisdom of the Perl Monks concerning the following question:

Can someone please show how to remove duplicate records. I read in a file with records that have the input record separator to set to:

$/ = "\n\n"; 
open FILE, "LogMessages.txt" or die $!;

while(<FILE>)
{
       
}
[download]

Comment on How to remove duplicate records Download Code

Replies are listed 'Best First'.
Re: How to remove duplicate records by Corion (Patriarch) on Jun 09, 2013 at 08:28 UTC
This is a FAQ. See perlfaq4 on "duplicate", or alternatively run `perldoc -q duplicate` [download]	[reply] [d/l]
Re: How to remove duplicate records by hdb (Monsignor) on Jun 09, 2013 at 09:06 UTC
Use a hash to store the information whether or not a record has been seen already. Use the record as key. `use strict; use warnings; $/="\n\n"; my %seen; while(<DATA>){ print unless $seen{$_}++; } __DATA__ a b b a` [download]	[reply] [d/l]
Re: How to remove duplicate records by Old_Gray_Bear (Bishop) on Jun 09, 2013 at 22:04 UTC
A non-Perl solution for use on Linux and other Unix-like systems: `$ sort -u my.input.file` [download] ---- I Go Back to Sleep, Now. OGB	[reply] [d/l]
Re: How to remove duplicate records by rpnoble419 (Pilgrim) on Jun 10, 2013 at 04:14 UTC
Before a proper solution can be identified, what does your data look like and what are the criteria for it to be a dupe? You may have hundreds of calls in you log file to a specific graphic, but if they all come from different ip address at different times, they are not a dupe. Also how big is your log file? A hash based dupe system may not work over millions of records.	[reply]