Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: xtracting unique lines

by Cristoforo (Deacon)
on Mar 28, 2012 at 02:03 UTC ( #962047=note: print w/ replies, xml ) Need Help??


in reply to xtracting unique lines

Using grep, you can filter out duplicate fields by testing to see if they have been seen yet.

#!/usr/bin/perl use strict; use warnings; my %seen; { local $\ = "\n"; # call to print() ends in newline while (<DATA>) { chomp; print unless grep $seen{$_}++, split /\s+\+\s+/; } }

Chris

Update: Misread the question, missed that they can occur reversed.

This should produce the results.

#!/usr/bin/perl use strict; use warnings; my %seen; { local $\ = "\n"; # call to print() ends in newline while (<DATA>) { chomp; my $sorted = join "", sort split /\s\+\s/; print unless $seen{$sorted}++; } } __DATA__ d_145_1_2- + c_3_1_8-e_74_1_1- a_100_1_6-c_2_1_6- + b_50_1_2- c_69_1_17- + b_61_6_1- c_2_1_2- + a_123_1_1- d_83_1_1- + c_2_1_5-d_162_1_1- c_2_1_2- + a_123_1_1- a_123_1_1- + c_2_1_2-


Comment on Re: xtracting unique lines
Select or Download Code
Replies are listed 'Best First'.
Re^2: xtracting unique lines
by anasuya (Novice) on Mar 28, 2012 at 11:07 UTC

    Hi. I tried out what you sed above. It worked. thanks.. Now what i need to do further is count the occurrences of each of these lines. As you can see in <DATA>, the string "c_2_1_2- + a_123_1_1-" has occurred 2 times and the reverse of it "a_123_1_1- + c_2_1_2-" has occurred once. Now i need to get a cumulative count for this pair (irrespective of the order in which it occurs i.e. as "a_123_1_1- + c_2_1_2-" or as "c_2_1_2- + a_123_1_1-", so that the total count of this entry is =3 as in <DATA>) The actual file which i am working on is similar but is larger in size, and has around 8000 lines. What is the solution to this problem? awk hasn't helped me so far.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://962047]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2015-07-28 07:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (252 votes), past polls