http://www.perlmonks.org?node_id=962047


in reply to xtracting unique lines

Using grep, you can filter out duplicate fields by testing to see if they have been seen yet.
#!/usr/bin/perl use strict; use warnings; my %seen; { local $\ = "\n"; # call to print() ends in newline while (<DATA>) { chomp; print unless grep $seen{$_}++, split /\s+\+\s+/; } }

Chris

Update: Misread the question, missed that they can occur reversed.

This should produce the results.

#!/usr/bin/perl use strict; use warnings; my %seen; { local $\ = "\n"; # call to print() ends in newline while (<DATA>) { chomp; my $sorted = join "", sort split /\s\+\s/; print unless $seen{$sorted}++; } } __DATA__ d_145_1_2- + c_3_1_8-e_74_1_1- a_100_1_6-c_2_1_6- + b_50_1_2- c_69_1_17- + b_61_6_1- c_2_1_2- + a_123_1_1- d_83_1_1- + c_2_1_5-d_162_1_1- c_2_1_2- + a_123_1_1- a_123_1_1- + c_2_1_2-

Replies are listed 'Best First'.
Re^2: xtracting unique lines
by anasuya (Novice) on Mar 28, 2012 at 11:07 UTC

    Hi. I tried out what you sed above. It worked. thanks.. Now what i need to do further is count the occurrences of each of these lines. As you can see in <DATA>, the string "c_2_1_2- + a_123_1_1-" has occurred 2 times and the reverse of it "a_123_1_1- + c_2_1_2-" has occurred once. Now i need to get a cumulative count for this pair (irrespective of the order in which it occurs i.e. as "a_123_1_1- + c_2_1_2-" or as "c_2_1_2- + a_123_1_1-", so that the total count of this entry is =3 as in <DATA>) The actual file which i am working on is similar but is larger in size, and has around 8000 lines. What is the solution to this problem? awk hasn't helped me so far.