Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Reading file and matching lines

by Jalcock501 (Sexton)
on Feb 13, 2014 at 15:50 UTC ( #1074855=note: print w/ replies, xml ) Need Help??


in reply to Re: Reading file and matching lines
in thread Reading file and matching lines

Hi Kcott

I complete forgot about that thread, thank you for reminding me.

I do however have a quick question... if I want to check for duplicate G entries within the same scope (i.e between E and h records) how would I do it.

I have some example code I tried but it just prints all G records.

my $lines if(/^G/) { next if ($lines eq $_); $lines = $_; print $_; }
here is the example data I'm using
E123456789 G123456798 ignore this as this is the first instance of G record +in scope h12345 E1234567 E7899874 G123456798 even though this is the same ignore as its first insta +nce G123456789 ignore this as it is different from previous G record G123465798 should flag duplicate here because it is the same firs +t G record in scope!!! h1245


Comment on Re^2: Reading file and matching lines
Select or Download Code
Re^3: Reading file and matching lines
by kcott (Abbot) on Feb 14, 2014 at 00:33 UTC

    Firstly, you have no duplicates in any (of what you're calling) "scope". G123465798 is not a duplicate of G123456798: you've transposed the 5 and the 6. I've fixed this in the example below.

    There's a standard idiom for checking for duplicates in this sort of scenario. Use a hash (often called %seen) that has as its keys whatever identifier you're checking. While processing, if the key exists, it's a duplicate, so skip/flag/etc. as appropriate; if the key doesn't exist, it's unique, so use it and then add it to the hash (usually done with a postfix increment).

    Here's an example using your fixed data:

    #!/usr/bin/env perl -l use strict; use warnings; my @data = ( [ qw{E123456789 G123456798 h12345} ], [ qw{E1234567 E7899874 G123456798 G123456789 G123456798 h1245} ], ); for my $scope (@data) { my %seen; for my $identifier (@$scope) { print $identifier unless $seen{$identifier}++; } }

    Output:

    E123456789 G123456798 h12345 E1234567 E7899874 G123456798 G123456789 h1245

    -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1074855]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2014-10-21 09:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (99 votes), past polls