Re: Reading file and matching lines

in reply to Reading file and matching lines

You asked a very similar question, with a very similar title, using very similar data, in "Search file for certain lines".

Here's a cutdown version (with appropriate modifications) of the technique I provided in that thread (Re: Search file for certain lines):

#!/usr/bin/env perl

use strict;
use warnings;

local $/ = "\nh";
print "Block $.\n", /^(E.*?)^G/ms ? $1 : "Error\n" while <DATA>;

__DATA__
hblah
Qblah
Eblock_1_line_1
Eblock_1_line_2
Gblah
hblah
Qblah
Gblah
hblah
Qblah
Eblock_3_line_1
Eblock_3_line_2
Gblah
[download]

Output:

Block 1
Eblock_1_line_1
Eblock_1_line_2
Block 2
Error
Block 3
Eblock_3_line_1
Eblock_3_line_2
[download]

In Re: Search file for certain lines, I provided an explanation of the code as well as links to more detailed documentation. I've introduced no new concepts here: if there's something you don't understand here, go back to the earlier post for more information.

-- Ken

Comment on Re: Reading file and matching lines Select or Download Code

Replies are listed 'Best First'.
Re^2: Reading file and matching lines by Jalcock501 (Sexton) on Feb 13, 2014 at 15:50 UTC
Hi Kcott I complete forgot about that thread, thank you for reminding me. I do however have a quick question... if I want to check for duplicate G entries within the same scope (i.e between E and h records) how would I do it. I have some example code I tried but it just prints all G records. `my $lines if(/^G/) { next if ($lines eq $_); $lines = $_; print $_; }` [download] here is the example data I'm using `E123456789 G123456798 ignore this as this is the first instance of G record +in scope h12345 E1234567 E7899874 G123456798 even though this is the same ignore as its first insta +nce G123456789 ignore this as it is different from previous G record G123465798 should flag duplicate here because it is the same firs +t G record in scope!!! h1245` [download]	[reply] [d/l] [select]
Re^3: Reading file and matching lines by kcott (Archbishop) on Feb 14, 2014 at 00:33 UTC
Firstly, you have no duplicates in any (of what you're calling) "scope". `G123465798` is not a duplicate of `G123456798`: you've transposed the `5` and the `6`. I've fixed this in the example below. There's a standard idiom for checking for duplicates in this sort of scenario. Use a hash (often called `%seen`) that has as its keys whatever identifier you're checking. While processing, if the key exists, it's a duplicate, so skip/flag/etc. as appropriate; if the key doesn't exist, it's unique, so use it and then add it to the hash (usually done with a postfix increment). Here's an example using your fixed data: `#!/usr/bin/env perl -l use strict; use warnings; my @data = ( [ qw{E123456789 G123456798 h12345} ], [ qw{E1234567 E7899874 G123456798 G123456789 G123456798 h1245} ], ); for my $scope (@data) { my %seen; for my $identifier (@$scope) { print $identifier unless $seen{$identifier}++; } }` [download] Output: `E123456789 G123456798 h12345 E1234567 E7899874 G123456798 G123456789 h1245` [download] -- Ken	[reply] [d/l] [select]

In Section Seekers of Perl Wisdom