Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: Removing duplicates lines

by sundialsvc4 (Abbot)
on Sep 04, 2013 at 22:34 UTC ( #1052451=note: print w/replies, xml ) Need Help??

in reply to Removing duplicates lines

Your first-attempt code tries to do the impossible:   to read the entire file into memory.   Fortunately, you don’t need to do that.

The filter that you seek to build only ever needs to consider two lines:   “this one,” and “the previous one if any.”   Thus, you can read and process the file (or files ...) line-by-line with an algorithm that looks something like this:

my $previous_line = undef; # INITIALLY, THERE IS NO 'PREVIOUS LINE' while ( my $current_line = <FILE> ) { next if defined($previous_line) && (substr($current_line, 10, 2) eq substr(previous_line, 10, 2) ) + # OR WHATEVER-IT-IS ... ; << "ooh, we survived!!" so, do something magical >> $previous_line = $current_line; }

Perl’s short-circuit boolean evaluation comes in handy here... the special case of “this is the first line in the file” is marked by $previous_line = undef, and the condition as-written expressly omits that case from consideration, evaluating the substr()s only when both values are known to exist.

Replies are listed 'Best First'.
Re^2: Removing duplicates lines
by vihar (Acolyte) on Sep 05, 2013 at 12:40 UTC
    Thanks everyone for your help!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1052451]
[marto]: Zuckbot 5000 is bored of your human shenanigans

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (12)
As of 2018-03-20 14:31 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (254 votes). Check out past polls.