Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
laziness, impatience, and hubris
 
PerlMonks  

Re^2: Selective printing of the Duplicates

by Thomas Kennll (Acolyte)
on Jan 30, 2013 at 18:03 UTC ( #1016130=note: print w/ replies, xml ) Need Help??


in reply to Re: Selective printing of the Duplicates
in thread Selective printing of the Duplicates

Thank you!! If you notice, my data file is a space delimited file and the records are not exactly duplicates..

30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S
Im going to split the record and then, I will look for column1 which is here as -> (30380868). Then if you notice, I have 2nd column as ** and then empty.. All the other columns remain the same.. I want to print the record which has ** in the 2nd column.. Above code you provided only gives me 1st duplicate value which is
30380868 N Sep 29 356200 AGEC682569 ATI + S


Comment on Re^2: Selective printing of the Duplicates
Select or Download Code
Re^3: Selective printing of the Duplicates
by Not_a_Number (Parson) on Jan 30, 2013 at 18:57 UTC
    my %seen; for ( reverse <DATA> ) { ( my $unstarred = $_ ) =~ s/\*\*/ /; print unless $seen{ $unstarred }++; } __DATA__ 30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S 71130740 N Sep 7 SM9481 AGEC683966 ATI + S 71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 32450045 N Jul 14 SN9672 AGEC685203 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S

    The only difference from your 'desired output' is that this code prints it out in reverse order. But I don't see this as a problem, as there seems to be no inherent order in the input/output anyway...

Re^3: Selective printing of the Duplicates
by Kenosis (Priest) on Jan 30, 2013 at 19:18 UTC

    My apologies, as I misunderstood. Try the following:

    use strict; use warnings; my %seen; while (<DATA>) { chomp; my ($col1) = /(\d+)/; $seen{$col1} = $_ if /\*\*/; $seen{$col1} = $_ if !exists $seen{$col1} or $seen{$col1} !~ /\*\*/; } print "$seen{$_}\n" for keys %seen; __DATA__ 11111111 ** N Sep 29 356200 AGEC682569 ATI + S 11111111 N Sep 29 356200 AGEC682569 ATI + S 30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S 71130740 N Sep 7 SM9481 AGEC683966 ATI + S 71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 32450045 N Jul 14 SN9672 AGEC685203 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S

    Output:

    71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 11111111 ** N Sep 29 356200 AGEC682569 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S

    The above will preferentially keep starred records, regardless of the order records are processed.

      Thanks a Ton!!!!! This was something I was looking for ..

        You're most welcome!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1016130]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2014-04-20 09:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (485 votes), past polls