Re^2: Reading file and matching lines

by Jalcock501 (Sexton)
on Feb 13, 2014 at 15:50 UTC

in reply to Re: Reading file and matching lines
in thread Reading file and matching lines

Hi Kcott

I complete forgot about that thread, thank you for reminding me.

I do however have a quick question... if I want to check for duplicate G entries within the same scope (i.e between E and h records) how would I do it.

I have some example code I tried but it just prints all G records.

my $lines if(/^G/) { next if ($lines eq $_); $lines = $_; print $_; }
here is the example data I'm using
E123456789 G123456798 ignore this as this is the first instance of G record +in scope h12345 E1234567 E7899874 G123456798 even though this is the same ignore as its first insta +nce G123456789 ignore this as it is different from previous G record G123465798 should flag duplicate here because it is the same firs +t G record in scope!!! h1245

Re^3: Reading file and matching lines
on Feb 14, 2014 at 00:33 UTC

    Firstly, you have no duplicates in any (of what you're calling) "scope". G123465798 is not a duplicate of G123456798: you've transposed the 5 and the 6. I've fixed this in the example below.

    There's a standard idiom for checking for duplicates in this sort of scenario. Use a hash (often called %seen) that has as its keys whatever identifier you're checking. While processing, if the key exists, it's a duplicate, so skip/flag/etc. as appropriate; if the key doesn't exist, it's unique, so use it and then add it to the hash (usually done with a postfix increment).

    Here's an example using your fixed data:

    #!/usr/bin/env perl -l use strict; use warnings; my @data = ( [ qw{E123456789 G123456798 h12345} ], [ qw{E1234567 E7899874 G123456798 G123456789 G123456798 h1245} ], ); for my $scope (@data) { my %seen; for my $identifier (@$scope) { print $identifier unless $seen{$identifier}++; } }


    E123456789 G123456798 h12345 E1234567 E7899874 G123456798 G123456789 h1245

    -- Ken

