Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: Removing duplicates lines

by sundialsvc4 (Abbot)
on Sep 04, 2013 at 22:34 UTC ( [id://1052451]=note: print w/replies, xml ) Need Help??


in reply to Removing duplicates lines

Your first-attempt code tries to do the impossible:   to read the entire file into memory.   Fortunately, you don’t need to do that.

The filter that you seek to build only ever needs to consider two lines:   “this one,” and “the previous one if any.”   Thus, you can read and process the file (or files ...) line-by-line with an algorithm that looks something like this:

my $previous_line = undef; # INITIALLY, THERE IS NO 'PREVIOUS LINE' while ( my $current_line = <FILE> ) { next if defined($previous_line) && (substr($current_line, 10, 2) eq substr(previous_line, 10, 2) ) + # OR WHATEVER-IT-IS ... ; << "ooh, we survived!!" so, do something magical >> $previous_line = $current_line; }

Perl’s short-circuit boolean evaluation comes in handy here... the special case of “this is the first line in the file” is marked by $previous_line = undef, and the condition as-written expressly omits that case from consideration, evaluating the substr()s only when both values are known to exist.

Replies are listed 'Best First'.
Re^2: Removing duplicates lines
by vihar (Acolyte) on Sep 05, 2013 at 12:40 UTC
    Thanks everyone for your help!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1052451]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2024-04-18 06:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found