HI,
I have a tab separated file which may run upto 5000 lines.
The file format is some thing like this:
XXXXXS331632 XXXXXS331632 female 40087 a5
XXXXXS331632 XXXXXS331632 female 47735 a5
XXXXXS331681 XXXXXS331681 male 40087 e6
XXXXXS331681 XXXXXS331681 male 47735 e6
XXXXXS331856 XXXXXS331856 male 40177 d1
XXXXXS331856 XXXXXS331856 male 47737 d1
What I really want to do is delete the row that appears twice irrespective of the difference(40087 , 47735) in the 4th column. I could remove either the first or the the second entry. At the end what I like to have is a file with the duplicate(?) entry removed.
Something like this:
XXXXXS331632 XXXXXS331632 female 40087 a5
XXXXXS331681 XXXXXS331681 male 40087 e6
XXXXXS331856 XXXXXS331856 male 40177 d1
Any suggestions please
Thanks for your time.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|