Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^2: Selective printing of the Duplicates

by Thomas Kennll (Acolyte)
on Jan 30, 2013 at 18:03 UTC ( #1016130=note: print w/ replies, xml ) Need Help??


in reply to Re: Selective printing of the Duplicates
in thread Selective printing of the Duplicates

Thank you!! If you notice, my data file is a space delimited file and the records are not exactly duplicates..

30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S
Im going to split the record and then, I will look for column1 which is here as -> (30380868). Then if you notice, I have 2nd column as ** and then empty.. All the other columns remain the same.. I want to print the record which has ** in the 2nd column.. Above code you provided only gives me 1st duplicate value which is
30380868 N Sep 29 356200 AGEC682569 ATI + S


Comment on Re^2: Selective printing of the Duplicates
Select or Download Code
Replies are listed 'Best First'.
Re^3: Selective printing of the Duplicates
by Not_a_Number (Parson) on Jan 30, 2013 at 18:57 UTC
    my %seen; for ( reverse <DATA> ) { ( my $unstarred = $_ ) =~ s/\*\*/ /; print unless $seen{ $unstarred }++; } __DATA__ 30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S 71130740 N Sep 7 SM9481 AGEC683966 ATI + S 71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 32450045 N Jul 14 SN9672 AGEC685203 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S

    The only difference from your 'desired output' is that this code prints it out in reverse order. But I don't see this as a problem, as there seems to be no inherent order in the input/output anyway...

Re^3: Selective printing of the Duplicates
by Kenosis (Priest) on Jan 30, 2013 at 19:18 UTC

    My apologies, as I misunderstood. Try the following:

    use strict; use warnings; my %seen; while (<DATA>) { chomp; my ($col1) = /(\d+)/; $seen{$col1} = $_ if /\*\*/; $seen{$col1} = $_ if !exists $seen{$col1} or $seen{$col1} !~ /\*\*/; } print "$seen{$_}\n" for keys %seen; __DATA__ 11111111 ** N Sep 29 356200 AGEC682569 ATI + S 11111111 N Sep 29 356200 AGEC682569 ATI + S 30380868 N Sep 29 356200 AGEC682569 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S 71130740 N Sep 7 SM9481 AGEC683966 ATI + S 71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 32450045 N Jul 14 SN9672 AGEC685203 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S

    Output:

    71130740 ** N Sep 7 SM9481 AGEC683966 ATI + S 34680135 N Sep 30 349450 AGEC685442 ATI + S 32450045 ** N Jul 14 SN9672 AGEC685203 ATI + S 11111111 ** N Sep 29 356200 AGEC682569 ATI + S 36450223 N Aug 30 SU8329 AGEC685348 ATI + S 30380868 ** N Sep 29 356200 AGEC682569 ATI + S

    The above will preferentially keep starred records, regardless of the order records are processed.

      Thanks a Ton!!!!! This was something I was looking for ..

        You're most welcome!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1016130]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2015-07-08 01:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (93 votes), past polls