Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^3: Delete Duplicate Entry in a text file

by sauoq (Abbot)
on Jun 20, 2012 at 11:30 UTC ( #977309=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Delete Duplicate Entry in a text file
in thread Delete Duplicate Entry in a text file

Without having changed your code too much

It could stand a bit more changing though.

$line =~ s/\n//g; next if !$line;
Why use s/// and why the /g if you do? If you are going to do it, chomp $line; is preferable. Now what if the line is "0\n"? Zero is false. Contrived? Okay... then what if the line contains some whitespace before the newline? Whitespace is true. Checking for the truth of !$line isn't really what you mean. You want to skip $line unless it contains a non-whitespace character so just say what you mean:
next unless $line =~ /\S/;
No need to change the line and no need to re-append the newline later.

-sauoq
"My two cents aren't worth a dime.";


Comment on Re^3: Delete Duplicate Entry in a text file
Select or Download Code
Re^4: Delete Duplicate Entry in a text file
by Kenosis (Priest) on Jun 20, 2012 at 18:44 UTC

    Thank you for your thoughts. My intent was to produce the OP's requested output w/o doing too much to the original code for the sake of a differentiated learning experience.

    I tried your suggestions as follows on the OP's data set:

    use Modern::Perl; my $lastrow = ""; open my $fh, '<', 'fail.txt' or die $!; while (my $line = <$fh>) { next unless $line =~ /\S/; if ($line ne $lastrow) { print $line; $lastrow = $line; } } close $fh;

    Output:

    hostname1.com Gateway FAIL hostname2.com Gateway FAIL Gateway FAIL

    Unless I've misunderstood your re-coding suggestions, they do not produce the OP's desired outcome. (Using say instead of print produces the desired line spacing, but the last string is repeated.)

      Unless I've misunderstood your re-coding suggestions, they do not produce the OP's desired outcome.

      Firstly, I ran that code and got

      hostname1.com Gateway FAIL hostname2.com Gateway FAIL
      which is exactly what I'd expect. The last line was not duplicated. Perhaps you managed to get some whitespace at the end of that line in your input file?

      Secondly, you took out both of the newlines you were printing. Since you are now ignoring the blank lines that are in the input (as I said), you still have to create them on the output. You just don't have to remove the newline from the lines that have non-whitespace characters.

      So, to be clear, you should have changed the line

      print $line, "\n\n";
      to
      print $line, "\n";

      By the way, these modifications for the blanks are exactly what I was talking about in Re: Delete Duplicate Entry in a text file. To put it together with my code there:

      my $last; while (my $line = <$fh>) { next unless $line =~ /\S/; next if defined $last and $line eq $last; print $line, "\n"; $last = $line; }

      -sauoq
      "My two cents aren't worth a dime.";

        I appreciate your detailed explanation, and hope the OP gets a chance to read your comments. As mentioned, my pedagogical intent was to (slowly) build upon the OP's code. Had I the need, I'd likely do the following:

        my $last = ''; do { say if $_ ne $last; $last = $_ } for grep /\S/, <$fh>;

        Thank you for the dialog on this issue...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://977309]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2014-09-23 03:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (210 votes), past polls