Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

How to remove duplicate records

by SatisfyMyStruggles (Initiate)
on Jun 09, 2013 at 08:26 UTC ( #1037915=perlquestion: print w/replies, xml ) Need Help??
SatisfyMyStruggles has asked for the wisdom of the Perl Monks concerning the following question:

Can someone please show how to remove duplicate records. I read in a file with records that have the input record separator to set to:

$/ = "\n\n"; open FILE, "LogMessages.txt" or die $!; while(<FILE>) { }

Replies are listed 'Best First'.
Re: How to remove duplicate records
by Corion (Pope) on Jun 09, 2013 at 08:28 UTC

    This is a FAQ. See perlfaq4 on "duplicate", or alternatively run

    perldoc -q duplicate
Re: How to remove duplicate records
by hdb (Monsignor) on Jun 09, 2013 at 09:06 UTC

    Use a hash to store the information whether or not a record has been seen already. Use the record as key.

    use strict; use warnings; $/="\n\n"; my %seen; while(<DATA>){ print unless $seen{$_}++; } __DATA__ a b b a
Re: How to remove duplicate records
by Old_Gray_Bear (Bishop) on Jun 09, 2013 at 22:04 UTC
    A non-Perl solution for use on Linux and other Unix-like systems:
    $ sort -u my.input.file

    I Go Back to Sleep, Now.


Re: How to remove duplicate records
by rpnoble419 (Pilgrim) on Jun 10, 2013 at 04:14 UTC

    Before a proper solution can be identified, what does your data look like and what are the criteria for it to be a dupe? You may have hundreds of calls in you log file to a specific graphic, but if they all come from different ip address at different times, they are not a dupe. Also how big is your log file? A hash based dupe system may not work over millions of records.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1037915]
Approved by SamCG
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2018-04-24 23:52 GMT
Find Nodes?
    Voting Booth?