Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Removing duplicates lines

by sundialsvc4 (Abbot)
on Sep 04, 2013 at 22:34 UTC ( #1052451=note: print w/ replies, xml ) Need Help??


in reply to Removing duplicates lines

Your first-attempt code tries to do the impossible:   to read the entire file into memory.   Fortunately, you don’t need to do that.

The filter that you seek to build only ever needs to consider two lines:   “this one,” and “the previous one if any.”   Thus, you can read and process the file (or files ...) line-by-line with an algorithm that looks something like this:

my $previous_line = undef; # INITIALLY, THERE IS NO 'PREVIOUS LINE' while ( my $current_line = <FILE> ) { next if defined($previous_line) && (substr($current_line, 10, 2) eq substr(previous_line, 10, 2) ) + # OR WHATEVER-IT-IS ... ; << "ooh, we survived!!" so, do something magical >> $previous_line = $current_line; }

Perl’s short-circuit boolean evaluation comes in handy here... the special case of “this is the first line in the file” is marked by $previous_line = undef, and the condition as-written expressly omits that case from consideration, evaluating the substr()s only when both values are known to exist.


Comment on Re: Removing duplicates lines
Download Code
Re^2: Removing duplicates lines
by vihar (Acolyte) on Sep 05, 2013 at 12:40 UTC
    Thanks everyone for your help!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1052451]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (6)
As of 2015-07-04 18:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls