Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: Selective printing of the Duplicates

by Kenosis (Priest)
on Jan 30, 2013 at 17:48 UTC ( [id://1016127]=note: print w/replies, xml ) Need Help??


in reply to Selective printing of the Duplicates

Perhaps the following will help:

use strict; use warnings; my %seen; while (<DATA>) { chomp; print "$_\n" unless $seen{$_}++; } __DATA__ 30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 N Sep 29 356200 AGEC682569 ATI + S 71130740 N Sep 7 SM9481 AGEC683966 ATI + S 71130740 N Sep 7 SM9481 AGEC683966 ATI + S 32450045 N Jul 14 SN9672 AGEC685203 ATI + S 32450045 N Jul 14 SN9672 AGEC685203 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S

Output:

30380868 N Sep 29 356200 AGEC682569 ATI + S 71130740 N Sep 7 SM9481 AGEC683966 ATI + S 32450045 N Jul 14 SN9672 AGEC685203 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S

I want to print the one with "**" as 2nd column and ignore the rest of duplicates..

Doesn't this raise the issue of the indistinguishability of identicals?

The script prints only unique records. chomp is used in case the last line of data above doesn't end with a newline.

Replies are listed 'Best First'.
Re^2: Selective printing of the Duplicates
by Thomas Kennll (Acolyte) on Jan 30, 2013 at 18:03 UTC
    Thank you!! If you notice, my data file is a space delimited file and the records are not exactly duplicates..
    30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S
    Im going to split the record and then, I will look for column1 which is here as -> (30380868). Then if you notice, I have 2nd column as ** and then empty.. All the other columns remain the same.. I want to print the record which has ** in the 2nd column.. Above code you provided only gives me 1st duplicate value which is
    30380868 N Sep 29 356200 AGEC682569 ATI + S
      my %seen; for ( reverse <DATA> ) { ( my $unstarred = $_ ) =~ s/\*\*/ /; print unless $seen{ $unstarred }++; } __DATA__ 30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S 71130740 N Sep 7 SM9481 AGEC683966 ATI + S 71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 32450045 N Jul 14 SN9672 AGEC685203 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S

      The only difference from your 'desired output' is that this code prints it out in reverse order. But I don't see this as a problem, as there seems to be no inherent order in the input/output anyway...

      My apologies, as I misunderstood. Try the following:

      use strict; use warnings; my %seen; while (<DATA>) { chomp; my ($col1) = /(\d+)/; $seen{$col1} = $_ if /\*\*/; $seen{$col1} = $_ if !exists $seen{$col1} or $seen{$col1} !~ /\*\*/; } print "$seen{$_}\n" for keys %seen; __DATA__ 11111111 ** N Sep 29 356200 AGEC682569 ATI + S 11111111 N Sep 29 356200 AGEC682569 ATI + S 30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S 71130740 N Sep 7 SM9481 AGEC683966 ATI + S 71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 32450045 N Jul 14 SN9672 AGEC685203 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S

      Output:

      71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 11111111 ** N Sep 29 356200 AGEC682569 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S

      The above will preferentially keep starred records, regardless of the order records are processed.

        Thanks a Ton!!!!! This was something I was looking for ..

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1016127]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2024-04-24 10:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found