Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

How to remove duplicate records

by SatisfyMyStruggles (Initiate)
on Jun 09, 2013 at 08:26 UTC ( #1037915=perlquestion: print w/replies, xml ) Need Help??
SatisfyMyStruggles has asked for the wisdom of the Perl Monks concerning the following question:

Can someone please show how to remove duplicate records. I read in a file with records that have the input record separator to set to:

$/ = "\n\n"; open FILE, "LogMessages.txt" or die $!; while(<FILE>) { }

Replies are listed 'Best First'.
Re: How to remove duplicate records
by Corion (Pope) on Jun 09, 2013 at 08:28 UTC

    This is a FAQ. See perlfaq4 on "duplicate", or alternatively run

    perldoc -q duplicate
Re: How to remove duplicate records
by hdb (Prior) on Jun 09, 2013 at 09:06 UTC

    Use a hash to store the information whether or not a record has been seen already. Use the record as key.

    use strict; use warnings; $/="\n\n"; my %seen; while(<DATA>){ print unless $seen{$_}++; } __DATA__ a b b a
Re: How to remove duplicate records
by Old_Gray_Bear (Bishop) on Jun 09, 2013 at 22:04 UTC
    A non-Perl solution for use on Linux and other Unix-like systems:
    $ sort -u my.input.file

    I Go Back to Sleep, Now.


Re: How to remove duplicate records
by rpnoble419 (Pilgrim) on Jun 10, 2013 at 04:14 UTC

    Before a proper solution can be identified, what does your data look like and what are the criteria for it to be a dupe? You may have hundreds of calls in you log file to a specific graphic, but if they all come from different ip address at different times, they are not a dupe. Also how big is your log file? A hash based dupe system may not work over millions of records.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1037915]
Approved by SamCG
Discipulus i really need my private version of log2ban..

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2017-06-26 10:44 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (577 votes). Check out past polls.