Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: xtracting unique lines

by Cristoforo (Curate)
on Mar 28, 2012 at 02:03 UTC ( #962047=note: print w/replies, xml ) Need Help??

in reply to xtracting unique lines

Using grep, you can filter out duplicate fields by testing to see if they have been seen yet.
#!/usr/bin/perl use strict; use warnings; my %seen; { local $\ = "\n"; # call to print() ends in newline while (<DATA>) { chomp; print unless grep $seen{$_}++, split /\s+\+\s+/; } }


Update: Misread the question, missed that they can occur reversed.

This should produce the results.

#!/usr/bin/perl use strict; use warnings; my %seen; { local $\ = "\n"; # call to print() ends in newline while (<DATA>) { chomp; my $sorted = join "", sort split /\s\+\s/; print unless $seen{$sorted}++; } } __DATA__ d_145_1_2- + c_3_1_8-e_74_1_1- a_100_1_6-c_2_1_6- + b_50_1_2- c_69_1_17- + b_61_6_1- c_2_1_2- + a_123_1_1- d_83_1_1- + c_2_1_5-d_162_1_1- c_2_1_2- + a_123_1_1- a_123_1_1- + c_2_1_2-

Replies are listed 'Best First'.
Re^2: xtracting unique lines
by anasuya (Novice) on Mar 28, 2012 at 11:07 UTC

    Hi. I tried out what you sed above. It worked. thanks.. Now what i need to do further is count the occurrences of each of these lines. As you can see in <DATA>, the string "c_2_1_2- + a_123_1_1-" has occurred 2 times and the reverse of it "a_123_1_1- + c_2_1_2-" has occurred once. Now i need to get a cumulative count for this pair (irrespective of the order in which it occurs i.e. as "a_123_1_1- + c_2_1_2-" or as "c_2_1_2- + a_123_1_1-", so that the total count of this entry is =3 as in <DATA>) The actual file which i am working on is similar but is larger in size, and has around 8000 lines. What is the solution to this problem? awk hasn't helped me so far.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://962047]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2018-06-20 08:13 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (116 votes). Check out past polls.