Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
The stupid question is the question not asked
 
PerlMonks  

Remove duplicate data from text file

by peppiv (Curate)
on Nov 27, 2001 at 19:17 UTC ( [id://127894]=perlquestion: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.

peppiv has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file with many items:

Tofu
Ginseng
Marbles
Green Tea
Ginseng
IChing
etc.

How can I remove duplicate data from this file?

Thanks in advance

Uncle Peppi

Replies are listed 'Best First'.
Re: Remove duplicate data from text file
by mirod (Canon) on Nov 27, 2001 at 19:24 UTC

    If you are on unix and don't care about the order of the output (or want it sorted): sort -u <file> is your friend.

    Otherwise you can try:

    perl -n -e 'print unless $seen{$_}++;' <file>
Re: Remove duplicate data from text file
by Dogma (Pilgrim) on Nov 27, 2001 at 19:28 UTC
    Have you tried using vi? {ED: Hey this was a joke, enough with the - votes... I admit I was bad}
      Please excuse my ignorance.

      Is vi a command line option?
        (I suspect that your just joking with me now but...) Vi is by far the best (and cryptic) text editor. It is also one half of the eternal Vi vs. Emacs debate. Although Emacs is really more of a religion then a text editor. :)
Re: Remove duplicate data from text file
by dragonchild (Archbishop) on Nov 27, 2001 at 19:35 UTC
    1. Read the file into an array
    2. Create a hash whose keys are the values in the array
    3. Create an array from the keys of the hash
    4. Print the array out to a file
    The details are left as an exercise for the reader.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

    Code
    by ggoebel (Sexton) on Nov 27, 2001 at 20:31 UTC
      my $filename = ...; local $/; open IN, $filename; $text = <IN>; @words{split /\s+/, $text} = undef; @words = keys %words;
Re: Remove duplicate data from text file
by jlongino (Parson) on Nov 28, 2001 at 03:26 UTC
    You could use this as well:
    use strict; my %hash; my $file = 'infile.txt'; open INFILE, "<$file" || die "Can't open '$file' $!\n" ; $hash{$_}++ while <INFILE>; open OUTFILE, ">$file" || die "Can't open '$file' $!\n" ; print OUTFILE "$_" foreach (keys %hash);

    --Jim

Re: Remove duplicate data from text file
by CharlesClarkson (Curate) on Nov 28, 2001 at 04:02 UTC

    If you need to maintain the original order and delete subsequent duplicates, you could try this.

    { my %seen; my ($in_file, $out_file) = qw| in.txt out.txt|; open my $in_fh, $in_file or die "$in_file: $!"; open my $out_fh, '>', $out_file or die "$in_file: $!"; while ( <$in_fh> ) { print $out_fh $_ unless $seen{$_}++; } }



    HTH,
    Charles K. Clarkson


    Why isn't phonetic spelled the way it sounds?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://127894]
Approved by root
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.