Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Delete Duplicate Entry in a text file

by astronogun (Sexton)
on Jun 20, 2012 at 02:16 UTC ( [id://977193]=perlquestion: print w/replies, xml ) Need Help??

astronogun has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I would like to seek help regarding how to delete multiple entries in a text file.

I have a fail.txt file which has the contents of the following:

hostname1.com Gateway FAIL Gateway FAIL Gateway FAIL hostname2.com Gateway FAIL Gateway FAIL Gateway FAIL

I want to delete the other two Gateway FAIL on every host the result is something like this.

hostname1.com Gateway FAIL hostname2.com Gateway FAIL

note: the fail.txt file is the output that came from a url which is after parsing the output will be print at fail.txt file.

Hope you could help me with this thank you

Replies are listed 'Best First'.
Re: Delete Duplicate Entry in a text file
by stevieb (Canon) on Jun 20, 2012 at 02:32 UTC

    Have you tried anything yet? If so, will you show us your code?

      Hi stevieb,

      I haven't tried anything yet, I'm thinking to use a regex that will find the entries that I need, use grep to locate them and print the result. But I think if I use that still the other 2 Gatway FAIL will match the regular expression. so it's useless.. I'm so clueless on how to do this..

Re: Delete Duplicate Entry in a text file
by sauoq (Abbot) on Jun 20, 2012 at 02:41 UTC

    Assuming you've opened $fh for reading on your fail.txt file or whatever, it should be about as easy as....

    my $last; while (my $line = <$fh>) { next if defined $last and $line eq $last; print $line; $last = $line; }
    You'll have to modify it a bit for those blank lines, if they really appear in your log files.

    Edit: added my $line = to while loop.

    -sauoq
    "My two cents aren't worth a dime.";
Re: Delete Duplicate Entry in a text file
by aaron_baugher (Curate) on Jun 20, 2012 at 03:22 UTC

    Here it is with some common tools. The concepts could be translated into Perl code. The grep takes out the blank lines, the uniq removes consecutive duplicates, and then sed adds an extra newline to the end of each line to put the blank lines back in again. Although in Perl it might be easier to make a blank line your input record separator, and then you'd just have to drop consecutive duplicates.

    grep -v ^$ fail.txt | uniq | sed 's/$/\n/'

    Aaron B.
    Available for small or large Perl jobs; see my home node.

Re: Delete Duplicate Entry in a text file
by kcott (Archbishop) on Jun 20, 2012 at 08:19 UTC

    I started looking at this based on the data as presented; however, there appears to be a question mark over whether all that whitespace actually exists. Here's solutions for both scenarios.

    Without extra whitespace:

    #!/usr/bin/env perl use 5.010; use strict; use warnings; my ($host, %seen); while (<DATA>) { chomp; if (m{^hostname\d\.com}) { say $host = $_; } else { say unless $seen{$host}{$_}++; } } __DATA__ hostname1.com Gateway FAIL Gateway FAIL Gateway FAIL hostname2.com Gateway FAIL Gateway FAIL Gateway FAIL

    Output:

    $ pm_del_group_dup_nospace.pl hostname1.com Gateway FAIL hostname2.com Gateway FAIL

    With whitespace (as written):

    #!/usr/bin/env perl use strict; use warnings; my ($host, %seen); my $host_re = qr{ \A ( hostname \d+ [.] com ) \s* \z }msx; my $fail_re = qr{ \A ( .*? ) \s*? \z }msx; { local $/ = ""; while (<DATA>) { if (m{$host_re}) { $host = $1; print; } else { print unless $seen{$host}{(m{$fail_re})[0]}++; } } } __DATA__ hostname1.com Gateway FAIL Gateway FAIL Gateway FAIL hostname2.com Gateway FAIL Gateway FAIL Gateway FAIL

    Output:

    $ pm_del_group_dup.pl hostname1.com Gateway FAIL hostname2.com Gateway FAIL

    -- Ken

Re: Delete Duplicate Entry in a text file
by johngg (Canon) on Jun 20, 2012 at 10:39 UTC

    This code uses the %seen hash idiom but resets the hash whenever a new host is reached; this might be tricky if host names are difficult to detect. It replaces empty lines by joining lines to be printed with newlines. Enclosing the selection code in a do block means the %seen hash is not left hanging around.

    knoppix@Microknoppix:~$ perl -Mstrict -wE ' > open my $inFH, q{<}, \ <<EOD or die $!; > hostname1.com > > Gateway FAIL > > Gateway FAIL > > Gateway FAIL > > hostname2.com > > Gateway FAIL > > Gateway FAIL > > Gateway FAIL > > EOD > > print join qq{\n}, do { > my %seen; > grep { > %seen = () if m{^hostname}; > ! m{^\s*$} && ! $seen{ $_ } ++; > } > <$inFH>; > }, q{};' hostname1.com Gateway FAIL hostname2.com Gateway FAIL knoppix@Microknoppix:~$

    I hope this is of interest.

    Cheers,

    JohnGG

Re: Delete Duplicate Entry in a text file
by Anonymous Monk on Jun 20, 2012 at 02:41 UTC

    Hope you could help me with this thank you

    Please explain how you could accomplish this using paper and pencil, write out the steps you would take in simple english

Re: Delete Duplicate Entry in a text file
by astronogun (Sexton) on Jun 20, 2012 at 03:40 UTC

    update:

    I tried this code:

    my $lastrow = ""; while (my $line == "fail.txt") { $line =~ /(.*?)\n/; $line = $1; if ($line ne $lastrow) { print $line, "\n"; $lastrow = $line; } }

    but didnt processed anything it showed blinking underscore for so long, I think it hangs..

      Nice work, astronogun! However, there were just a couple of issues...

      Did you really mean my $line == "fail.txt"? You used == instead of the assignment = and it needs to be done on a file handle: my $line = <$fh>

      Without having changed your code too much, try the following:

      use Modern::Perl; my $lastrow = ""; open my $fh, '<', 'fail.txt' or die $!; while (my $line = <$fh>) { $line =~ s/\n//g; next if !$line; if ($line ne $lastrow) { print $line, "\n\n"; $lastrow = $line; } } close $fh;

      Output:

      hostname1.com Gateway FAIL hostname2.com Gateway FAIL
        Without having changed your code too much

        It could stand a bit more changing though.

        $line =~ s/\n//g; next if !$line;
        Why use s/// and why the /g if you do? If you are going to do it, chomp $line; is preferable. Now what if the line is "0\n"? Zero is false. Contrived? Okay... then what if the line contains some whitespace before the newline? Whitespace is true. Checking for the truth of !$line isn't really what you mean. You want to skip $line unless it contains a non-whitespace character so just say what you mean:
        next unless $line =~ /\S/;
        No need to change the line and no need to re-append the newline later.

        -sauoq
        "My two cents aren't worth a dime.";

        Hi Kenosis

        Great it worked thank you very much for modifying the script :)

        Regarding the "=" sign I used the "==" sign because on my perl editor when I hovered my mouse on that line it said that it should be "==" not "=" (I'm using Komodo Edit 7.0)

Re: Delete Duplicate Entry in a text file
by Anonymous Monk on Jun 20, 2012 at 09:37 UTC
    #!usr/bin/perl use strict; open (FH,"hostR.txt")or die "cannot open a file \n"; open (FH1,">hostRR.txt")or die "cannot open a file \n"; local $/; my $fileData = <FH>; close FH; if ($fileData =~ s/(\w+\s\w+)\s*\1\s*\1/$1/gi){ print FH1 $fileData; } close FH1;

    ################ You can use the above regular expression to remove consecutive duplicates from your file

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://977193]
Approved by sauoq
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-03-28 18:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found