Re: Delete Duplicate Entry in a text file
by stevieb (Canon) on Jun 20, 2012 at 02:32 UTC
|
Have you tried anything yet? If so, will you show us your code?
| [reply] [Watch: Dir/Any] |
|
Hi stevieb,
I haven't tried anything yet, I'm thinking to use a regex that will find the entries that I need, use grep to locate them and print the result. But I think if I use that still the other 2 Gatway FAIL will match the regular expression. so it's useless.. I'm so clueless on how to do this..
| [reply] [Watch: Dir/Any] |
Re: Delete Duplicate Entry in a text file
by sauoq (Abbot) on Jun 20, 2012 at 02:41 UTC
|
Assuming you've opened $fh for reading on your fail.txt file or whatever, it should be about as easy as....
my $last;
while (my $line = <$fh>) {
next if defined $last and $line eq $last;
print $line;
$last = $line;
}
You'll have to modify it a bit for those blank lines, if they really appear in your log files.
Edit: added my $line = to while loop.
-sauoq
"My two cents aren't worth a dime.";
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Delete Duplicate Entry in a text file
by aaron_baugher (Curate) on Jun 20, 2012 at 03:22 UTC
|
Here it is with some common tools. The concepts could be translated into Perl code. The grep takes out the blank lines, the uniq removes consecutive duplicates, and then sed adds an extra newline to the end of each line to put the blank lines back in again. Although in Perl it might be easier to make a blank line your input record separator, and then you'd just have to drop consecutive duplicates.
grep -v ^$ fail.txt | uniq | sed 's/$/\n/'
Aaron B.
Available for small or large Perl jobs; see my home node.
| [reply] [Watch: Dir/Any] [d/l] |
Re: Delete Duplicate Entry in a text file
by kcott (Archbishop) on Jun 20, 2012 at 08:19 UTC
|
I started looking at this based on the data as presented; however, there appears to be a question mark over whether all that whitespace actually exists. Here's solutions for both scenarios.
Without extra whitespace:
#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
my ($host, %seen);
while (<DATA>) {
chomp;
if (m{^hostname\d\.com}) {
say $host = $_;
}
else {
say unless $seen{$host}{$_}++;
}
}
__DATA__
hostname1.com
Gateway FAIL
Gateway FAIL
Gateway FAIL
hostname2.com
Gateway FAIL
Gateway FAIL
Gateway FAIL
Output:
$ pm_del_group_dup_nospace.pl
hostname1.com
Gateway FAIL
hostname2.com
Gateway FAIL
With whitespace (as written):
#!/usr/bin/env perl
use strict;
use warnings;
my ($host, %seen);
my $host_re = qr{ \A ( hostname \d+ [.] com ) \s* \z }msx;
my $fail_re = qr{ \A ( .*? ) \s*? \z }msx;
{
local $/ = "";
while (<DATA>) {
if (m{$host_re}) {
$host = $1;
print;
}
else {
print unless $seen{$host}{(m{$fail_re})[0]}++;
}
}
}
__DATA__
hostname1.com
Gateway FAIL
Gateway FAIL
Gateway FAIL
hostname2.com
Gateway FAIL
Gateway FAIL
Gateway FAIL
Output:
$ pm_del_group_dup.pl
hostname1.com
Gateway FAIL
hostname2.com
Gateway FAIL
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Delete Duplicate Entry in a text file
by johngg (Canon) on Jun 20, 2012 at 10:39 UTC
|
This code uses the %seen hash idiom but resets the hash whenever a new host is reached; this might be tricky if host names are difficult to detect. It replaces empty lines by joining lines to be printed with newlines. Enclosing the selection code in a do block means the %seen hash is not left hanging around.
knoppix@Microknoppix:~$ perl -Mstrict -wE '
> open my $inFH, q{<}, \ <<EOD or die $!;
> hostname1.com
>
> Gateway FAIL
>
> Gateway FAIL
>
> Gateway FAIL
>
> hostname2.com
>
> Gateway FAIL
>
> Gateway FAIL
>
> Gateway FAIL
>
> EOD
>
> print join qq{\n}, do {
> my %seen;
> grep {
> %seen = () if m{^hostname};
> ! m{^\s*$} && ! $seen{ $_ } ++;
> }
> <$inFH>;
> }, q{};'
hostname1.com
Gateway FAIL
hostname2.com
Gateway FAIL
knoppix@Microknoppix:~$
I hope this is of interest.
| [reply] [Watch: Dir/Any] [d/l] [select] |
Re: Delete Duplicate Entry in a text file
by Anonymous Monk on Jun 20, 2012 at 02:41 UTC
|
Hope you could help me with this thank you Please explain how you could accomplish this using paper and pencil, write out the steps you would take in simple english
| [reply] [Watch: Dir/Any] |
Re: Delete Duplicate Entry in a text file
by astronogun (Sexton) on Jun 20, 2012 at 03:40 UTC
|
update:
I tried this code:
my $lastrow = "";
while (my $line == "fail.txt")
{
$line =~ /(.*?)\n/;
$line = $1;
if ($line ne $lastrow)
{
print $line, "\n";
$lastrow = $line;
}
}
but didnt processed anything it showed blinking underscore for so long, I think it hangs.. | [reply] [Watch: Dir/Any] [d/l] |
|
Nice work, astronogun! However, there were just a couple of issues...
Did you really mean my $line == "fail.txt"? You used == instead of the assignment = and it needs to be done on a file handle: my $line = <$fh>
Without having changed your code too much, try the following:
use Modern::Perl;
my $lastrow = "";
open my $fh, '<', 'fail.txt' or die $!;
while (my $line = <$fh>)
{
$line =~ s/\n//g;
next if !$line;
if ($line ne $lastrow)
{
print $line, "\n\n";
$lastrow = $line;
}
}
close $fh;
Output:
hostname1.com
Gateway FAIL
hostname2.com
Gateway FAIL
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
$line =~ s/\n//g;
next if !$line;
Why use s/// and why the /g if you do? If you are going to do it, chomp $line; is preferable. Now what if the line is "0\n"? Zero is false. Contrived? Okay... then what if the line contains some whitespace before the newline? Whitespace is true. Checking for the truth of !$line isn't really what you mean. You want to skip $line unless it contains a non-whitespace character so just say what you mean:
next unless $line =~ /\S/;
No need to change the line and no need to re-append the newline later.
-sauoq
"My two cents aren't worth a dime.";
| [reply] [Watch: Dir/Any] [d/l] [select] |
|
|
|
|
Hi Kenosis
Great it worked thank you very much for modifying the script :)
Regarding the "=" sign I used the "==" sign because on my perl editor when I hovered my mouse on that line it said that it should be "==" not "=" (I'm using Komodo Edit 7.0)
| [reply] [Watch: Dir/Any] |
Re: Delete Duplicate Entry in a text file
by Anonymous Monk on Jun 20, 2012 at 09:37 UTC
|
#!usr/bin/perl
use strict;
open (FH,"hostR.txt")or die "cannot open a file \n";
open (FH1,">hostRR.txt")or die "cannot open a file \n";
local $/;
my $fileData = <FH>;
close FH;
if ($fileData =~ s/(\w+\s\w+)\s*\1\s*\1/$1/gi){
print FH1 $fileData;
}
close FH1;
################
You can use the above regular expression to remove consecutive duplicates from your file | [reply] [Watch: Dir/Any] [d/l] |