http://www.perlmonks.org?node_id=1037915

SatisfyMyStruggles has asked for the wisdom of the Perl Monks concerning the following question:

Can someone please show how to remove duplicate records. I read in a file with records that have the input record separator to set to:

$/ = "\n\n"; open FILE, "LogMessages.txt" or die $!; while(<FILE>) { }

Replies are listed 'Best First'.
Re: How to remove duplicate records
by Corion (Patriarch) on Jun 09, 2013 at 08:28 UTC

    This is a FAQ. See perlfaq4 on "duplicate", or alternatively run

    perldoc -q duplicate
Re: How to remove duplicate records
by hdb (Monsignor) on Jun 09, 2013 at 09:06 UTC

    Use a hash to store the information whether or not a record has been seen already. Use the record as key.

    use strict; use warnings; $/="\n\n"; my %seen; while(<DATA>){ print unless $seen{$_}++; } __DATA__ a b b a
Re: How to remove duplicate records
by Old_Gray_Bear (Bishop) on Jun 09, 2013 at 22:04 UTC
    A non-Perl solution for use on Linux and other Unix-like systems:
    $ sort -u my.input.file

    ----
    I Go Back to Sleep, Now.

    OGB

Re: How to remove duplicate records
by rpnoble419 (Pilgrim) on Jun 10, 2013 at 04:14 UTC

    Before a proper solution can be identified, what does your data look like and what are the criteria for it to be a dupe? You may have hundreds of calls in you log file to a specific graphic, but if they all come from different ip address at different times, they are not a dupe. Also how big is your log file? A hash based dupe system may not work over millions of records.