Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: xtracting unique lines

by Cristoforo (Deacon)
on Mar 28, 2012 at 02:03 UTC ( #962047=note: print w/ replies, xml ) Need Help??


in reply to xtracting unique lines

Using grep, you can filter out duplicate fields by testing to see if they have been seen yet.

#!/usr/bin/perl use strict; use warnings; my %seen; { local $\ = "\n"; # call to print() ends in newline while (<DATA>) { chomp; print unless grep $seen{$_}++, split /\s+\+\s+/; } }

Chris

Update: Misread the question, missed that they can occur reversed.

This should produce the results.

#!/usr/bin/perl use strict; use warnings; my %seen; { local $\ = "\n"; # call to print() ends in newline while (<DATA>) { chomp; my $sorted = join "", sort split /\s\+\s/; print unless $seen{$sorted}++; } } __DATA__ d_145_1_2- + c_3_1_8-e_74_1_1- a_100_1_6-c_2_1_6- + b_50_1_2- c_69_1_17- + b_61_6_1- c_2_1_2- + a_123_1_1- d_83_1_1- + c_2_1_5-d_162_1_1- c_2_1_2- + a_123_1_1- a_123_1_1- + c_2_1_2-


Comment on Re: xtracting unique lines
Select or Download Code
Re^2: xtracting unique lines
by anasuya (Novice) on Mar 28, 2012 at 11:07 UTC

    Hi. I tried out what you sed above. It worked. thanks.. Now what i need to do further is count the occurrences of each of these lines. As you can see in <DATA>, the string "c_2_1_2- + a_123_1_1-" has occurred 2 times and the reverse of it "a_123_1_1- + c_2_1_2-" has occurred once. Now i need to get a cumulative count for this pair (irrespective of the order in which it occurs i.e. as "a_123_1_1- + c_2_1_2-" or as "c_2_1_2- + a_123_1_1-", so that the total count of this entry is =3 as in <DATA>) The actual file which i am working on is similar but is larger in size, and has around 8000 lines. What is the solution to this problem? awk hasn't helped me so far.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://962047]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (10)
As of 2014-08-23 03:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (171 votes), past polls