Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: Remove Duplicate Lines

by afoken (Chancellor)
on Aug 01, 2019 at 19:47 UTC ( [id://11103725]=note: print w/replies, xml ) Need Help??


in reply to Re: Remove Duplicate Lines
in thread Remove Duplicate Lines

Let's see:

  • use strict missing
  • use warnings missing
  • Missing my for $ifile, $ofile, $header, $data.
  • no check that the program is called with the correct number of arguments
  • Forking a shell (1) via qx (``) begs for trouble - see Improve pipe open?
  • ... to run sed, just to read the first line of a file
  • ... while making sed read the entire file
  • ... and ignoring all quoting issues by simply not quoting at all - see The problem of "the" default shell
  • ... and ignoring the fact that sed is not available by default on Windows and other operating systems
  • Forking another shell via qx to pipe sed output to sort -u input
  • ... again without any qouting
  • ... again assuming sed is available everywhere
  • ... assuming a POSIX sort is available everywhere. DOS/Windows sort does not understand -u and can't sort and filter out dupes
  • ... reading the entire output of sort -u into memory
  • ... just to write it out again three lines later
  • And finally, exit 0 is redundant

This is highly inefficient and has several issues with "interesting" filenames.

In Re: Remove Duplicate Lines, BrowserUk explains how to use perl properly.

Another option - if running on a POSIX compatible system - is to use sort properly. Without headers, it is trivial:

sort -u < inputfile > outputfile

With headers, this will do:

head -n 1 inputfile > outputfile sed '1d' inputfile | sort -u >> outputfile

This way, head can stop processing the input file after the first line, unlike sed -n '1p'. Directly writing to the outputfile avoids all further overhead of your script.

Alexander


(1) yes, given a sane filename, perl may start the first sed without help of the default shell. Change the filename to something interesting and perl will start sed via the default shell.

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11103725]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (10)
As of 2024-04-23 08:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found