Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Reading file and matching lines

by kcott (Abbot)
on Feb 12, 2014 at 08:34 UTC ( #1074598=note: print w/ replies, xml ) Need Help??


in reply to Reading file and matching lines

G'day Jalcock501,

You asked a very similar question, with a very similar title, using very similar data, in "Search file for certain lines".

Here's a cutdown version (with appropriate modifications) of the technique I provided in that thread (Re: Search file for certain lines):

#!/usr/bin/env perl use strict; use warnings; local $/ = "\nh"; print "Block $.\n", /^(E.*?)^G/ms ? $1 : "Error\n" while <DATA>; __DATA__ hblah Qblah Eblock_1_line_1 Eblock_1_line_2 Gblah hblah Qblah Gblah hblah Qblah Eblock_3_line_1 Eblock_3_line_2 Gblah

Output:

Block 1 Eblock_1_line_1 Eblock_1_line_2 Block 2 Error Block 3 Eblock_3_line_1 Eblock_3_line_2

In Re: Search file for certain lines, I provided an explanation of the code as well as links to more detailed documentation. I've introduced no new concepts here: if there's something you don't understand here, go back to the earlier post for more information.

-- Ken


Comment on Re: Reading file and matching lines
Select or Download Code
Re^2: Reading file and matching lines
by Jalcock501 (Sexton) on Feb 13, 2014 at 15:50 UTC
    Hi Kcott

    I complete forgot about that thread, thank you for reminding me.

    I do however have a quick question... if I want to check for duplicate G entries within the same scope (i.e between E and h records) how would I do it.

    I have some example code I tried but it just prints all G records.

    my $lines if(/^G/) { next if ($lines eq $_); $lines = $_; print $_; }
    here is the example data I'm using
    E123456789 G123456798 ignore this as this is the first instance of G record +in scope h12345 E1234567 E7899874 G123456798 even though this is the same ignore as its first insta +nce G123456789 ignore this as it is different from previous G record G123465798 should flag duplicate here because it is the same firs +t G record in scope!!! h1245

      Firstly, you have no duplicates in any (of what you're calling) "scope". G123465798 is not a duplicate of G123456798: you've transposed the 5 and the 6. I've fixed this in the example below.

      There's a standard idiom for checking for duplicates in this sort of scenario. Use a hash (often called %seen) that has as its keys whatever identifier you're checking. While processing, if the key exists, it's a duplicate, so skip/flag/etc. as appropriate; if the key doesn't exist, it's unique, so use it and then add it to the hash (usually done with a postfix increment).

      Here's an example using your fixed data:

      #!/usr/bin/env perl -l use strict; use warnings; my @data = ( [ qw{E123456789 G123456798 h12345} ], [ qw{E1234567 E7899874 G123456798 G123456789 G123456798 h1245} ], ); for my $scope (@data) { my %seen; for my $identifier (@$scope) { print $identifier unless $seen{$identifier}++; } }

      Output:

      E123456789 G123456798 h12345 E1234567 E7899874 G123456798 G123456789 h1245

      -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1074598]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (16)
As of 2015-07-06 17:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (77 votes), past polls