Re: Removing duplicates lines

Your first-attempt code tries to do the impossible: to read the entire file into memory. Fortunately, you don’t need to do that.

The filter that you seek to build only ever needs to consider two lines: “this one,” and “the previous one if any.” Thus, you can read and process the file (or files ...) line-by-line with an algorithm that looks something like this:

my $previous_line = undef;  # INITIALLY, THERE IS NO 'PREVIOUS LINE'
while ( my $current_line = <FILE> ) {
   next if 
      defined($previous_line) &&
      (substr($current_line, 10, 2) eq substr(previous_line, 10, 2) ) 
+ 
       # OR WHATEVER-IT-IS ...
      ;

   << "ooh, we survived!!" so, do something magical >>

   $previous_line = $current_line;
}
[download]

Perl’s short-circuit boolean evaluation comes in handy here... the special case of “this is the first line in the file” is marked by $previous_line = undef, and the condition as-written expressly omits that case from consideration, evaluating the substr()s only when both values are known to exist.

Comment on Re: Removing duplicates lines Download Code

Replies are listed 'Best First'.

Re^2: Removing duplicates lines
by vihar (Acolyte) on Sep 05, 2013 at 12:40 UTC

Thanks everyone for your help!

[reply]


Syntactic Confectionery Delight
	PerlMonks