Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

If purchance it is necessary for you to keep the unique lines of your file in the same order, then this will remove all but the first occurance of each line and leave the remaining ones in their original order.

Just redirect the output to a new file on the command line (and uncomment the open line).

#! perl -sw use strict; my %lines; #open DATA, $ARGV[0] or die "Couldn't open $ARGV[0]: $!\n"; while (<DATA>) { print if not $lines{$_}++; } __DATA__ this is a line this is another line yet another and yet another still this is a line more and more and even more this is a line and this and that but not the other cos its a family website:)

Gives

C:\test>uniq this is a line this is another line yet another and yet another still more and more and even more and this and that but not the other cos its a family website:) C:\test>

The caveat of course is that with a large file, that hash could get mind of big, but maybe that's ok if this is what you need to do.


Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!

In reply to Re: Remove Duplicate Lines by BrowserUk
in thread Remove Duplicate Lines by dcb0127

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2024-04-23 12:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found