Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

removing redundant entries

by anasuya (Novice)
on Dec 01, 2011 at 10:25 UTC ( #941032=perlquestion: print w/ replies, xml ) Need Help??
anasuya has asked for the wisdom of the Perl Monks concerning the following question:

I have multiple entries in a file which look like this.
1HDZ-B 3NT5-A 0.489917 0.400067 1I30-B 2B35-E 0.879363 0.400068 1K0U-E 1UWK-A 0.969722 0.400072 1BPW-A 1EFL-C 0.556795 0.400077 1EFL-C 1BPW-A 0.556795 0.400077 1GIQ-B 3QJ5-B 0.997880 0.400077 1GEE-A 2B4R-O 0.829231 0.400080

for the 4th and 5th line,the first two columns are interchanged, i.e.1st column on line4 =2nd column on the subsequent line. so what if i want to keep only one line among lines 4 and 5. say the line 4, so that my file looks like this:

1HDZ-B 3NT5-A 0.489917 0.400067 1I30-B 2B35-E 0.879363 0.400068 1K0U-E 1UWK-A 0.969722 0.400072 1BPW-A 1EFL-C 0.556795 0.400077 1GIQ-B 3QJ5-B 0.997880 0.400077 1GEE-A 2B4R-O 0.829231 0.400080

also, there are multiple instances of this occurrence in my file. so how do i remove such redundant entries?

Comment on removing redundant entries
Select or Download Code
Re: removing redundant entries
by ikegami (Pope) on Dec 01, 2011 at 10:32 UTC
    perl -nae'print if !$seen{ join " ", sort @F[0,1] }++' infile >outfile
Re: removing redundant entries
by MidLifeXis (Prior) on Dec 01, 2011 at 14:01 UTC

    ikegami has given you a solution that works in your case, but in general, the steps to take to solve the "how do I remove redundant data" question are along these lines:

    • Identify the information that determines if a set of data is unique. This is the key. It may need to be normalized (the sort @F[0,1] call in ikegami's response, for example).
    • See if the key has already been used. Hashes are useful for this (the !$seen{...}++ construct in ikegami's response). See How can I remove duplicate elements from a list or array? (or perldoc -q duplicate) for more information.
    • Process the duplicate data for that key. In some cases it you are only interested in the last data associated with the key, in others you may want to group all of the data together, in still others, you may want to run some function on all of the data associated with that key.
    • Return the processed results.

    If you step through your question using this framework, it can be easier to come up with a solution on your own.

    Update: Added references to ikegami's response

    --MidLifeXis

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://941032]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (17)
As of 2014-07-24 15:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (161 votes), past polls