Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Remove duplicate data from text file

by peppiv (Curate)
on Nov 28, 2001 at 00:17 UTC ( [id://127894]=perlquestion: print w/replies, xml ) Need Help??

peppiv has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file with many items:

Green Tea

How can I remove duplicate data from this file?

Thanks in advance

Uncle Peppi

Replies are listed 'Best First'.
Re: Remove duplicate data from text file
by mirod (Canon) on Nov 28, 2001 at 00:24 UTC

    If you are on unix and don't care about the order of the output (or want it sorted): sort -u <file> is your friend.

    Otherwise you can try:

    perl -n -e 'print unless $seen{$_}++;' <file>
Re: Remove duplicate data from text file
by dragonchild (Archbishop) on Nov 28, 2001 at 00:35 UTC
    1. Read the file into an array
    2. Create a hash whose keys are the values in the array
    3. Create an array from the keys of the hash
    4. Print the array out to a file
    The details are left as an exercise for the reader.

    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

    by ggoebel (Sexton) on Nov 28, 2001 at 01:31 UTC
      my $filename = ...; local $/; open IN, $filename; $text = <IN>; @words{split /\s+/, $text} = undef; @words = keys %words;
Re: Remove duplicate data from text file
by CharlesClarkson (Curate) on Nov 28, 2001 at 09:02 UTC

    If you need to maintain the original order and delete subsequent duplicates, you could try this.

    { my %seen; my ($in_file, $out_file) = qw| in.txt out.txt|; open my $in_fh, $in_file or die "$in_file: $!"; open my $out_fh, '>', $out_file or die "$in_file: $!"; while ( <$in_fh> ) { print $out_fh $_ unless $seen{$_}++; } }

    Charles K. Clarkson

    Why isn't phonetic spelled the way it sounds?
Re: Remove duplicate data from text file
by jlongino (Parson) on Nov 28, 2001 at 08:26 UTC
    You could use this as well:
    use strict; my %hash; my $file = 'infile.txt'; open INFILE, "<$file" || die "Can't open '$file' $!\n" ; $hash{$_}++ while <INFILE>; open OUTFILE, ">$file" || die "Can't open '$file' $!\n" ; print OUTFILE "$_" foreach (keys %hash);


Re: Remove duplicate data from text file
by Dogma (Pilgrim) on Nov 28, 2001 at 00:28 UTC
    Have you tried using vi? {ED: Hey this was a joke, enough with the - votes... I admit I was bad}
      Please excuse my ignorance.

      Is vi a command line option?
        (I suspect that your just joking with me now but...) Vi is by far the best (and cryptic) text editor. It is also one half of the eternal Vi vs. Emacs debate. Although Emacs is really more of a religion then a text editor. :)

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://127894]
Approved by root
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-06-21 07:31 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.