Reading file and matching lines

Jalcock501 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Reading file and matching lines by choroba (Cardinal) on Feb 11, 2014 at 12:12 UTC
In Perl, you can read a file line by line without the need to load the whole file first. Use the diamond operator in a while loop (untested code): `open my $IN, '<', 'quoteout.dat' or die "$!"; my $searching_for_G; while (<$IN>) { $searching_for_G = 1 if 0 == index $_, 'E'; die "Error: h at line $." if $searching_for_G and 0 == index $_, ' +h'; if ($searching_for_G and 0 == index $_, 'G') { print "Found G at line $.\n"; undef $searching_for_G; } }` [download] Note that unlike in C, a string is not an array of characters in Perl (that's why I used index). Also, you did not specify what to do if G is found - should the program end or search for another E? I assumed the latter. $. contains the input line number. See perlvar for details. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^2: Reading file and matching lines by robby_dobby (Hermit) on Feb 11, 2014 at 12:56 UTC
Wouldn't it be better if you had just used `$searching_for_G and /^h/`? (I haven't tested this). It may well just be me, but `0 == index ...` sticks out oddly to my eyes.	[reply] [d/l] [select]
Re^2: Reading file and matching lines by Jalcock501 (Sexton) on Feb 11, 2014 at 13:26 UTC
Apologies, Yes there are several instances of this in a single file, so I need to do this through out the file and only report any errors if there are any. If none the script should exit normally.	[reply]
Re^3: Reading file and matching lines by karlgoethebier (Abbot) on Feb 11, 2014 at 13:45 UTC
"...only report any errors if there are any" If so, why not something simple like this: `while (<IN>) { print qq($1\n) if /^(E)/; print qq($1\n) if /^(G)/; die $1 if /^(h)/; }` [download] Or do i still misunderstand the specs? Best regards, Karl «The Crux of the Biscuit is the Apostrophe»	[reply] [d/l]
Re^3: Reading file and matching lines by GotToBTru (Prior) on Feb 11, 2014 at 14:23 UTC
`use strict; use warnings; my $infile = shift; my $found_E = 0; my $sets = 0; open my $ifh, '<', $infile; while(<$ifh>) { if (/^E/) { $found_E = 1; next; } if ($found_E) { if (/^G/) { $sets += 1; $found_E = 0; next; } if (/^h/) { print "Error! Found h before G\n"; exit; } } } close($ifh); printf "Found %d sets from E to G uninterrupted by h\n",$sets;` [download]	[reply] [d/l]
Re: Reading file and matching lines by Eily (Monsignor) on Feb 11, 2014 at 16:07 UTC
In the name of Tim Toady (There Is More Than One Way To Do It). Featuring the range, or flip-flop operator, which translates in human as "From .. till ..", and the next keyword. my $count =0; LINE: while(<DATA>) { next LINE unless /^E/../^G/; # next line unless we are between a li +ne starting with a end a line starting with G die "Oups, went too far!" if /^h/; # error if the line starts with +an h and hasn't been skipped by the previous statment $count++ unless /^G/; # count that do not start with a G } __DATA__ Q165HWN0X001 Q165HWN0X002 Q165HWN0X003 E99HEADER\|006\|001 E99INSSCH\|052\| E99POLCOM\|1\|\|IIL\|62\|35119849249024\|\|\|\|\| E99INSFAC2\|C00124\|\|\|\|\|\|XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800\| G35119849249024 h189SMA2 [download]	[reply] [d/l]
Re^2: Reading file and matching lines by tbone654 (Beadle) on Feb 11, 2014 at 20:26 UTC
One liner Prompt> perl -ne ' next if !/^E/; print $_; ' datafile E99HEADER\|006\|001 E99INSSCH\|052\| E99POLCOM\|1\|\|IIL\|62\|35119849249024\|\|\|\|\| E99INSFAC2\|C00124\|\|\|\|\|\|XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800\|	[reply]
Re^3: Reading file and matching lines by tbone654 (Beadle) on Feb 11, 2014 at 20:41 UTC
modified one-liner and data to test... F98020A@LUS76E8012758 /cygdrive/c/package $ perl -ne ' if (/^h/){print "error starts with h program exiting"; exit;} if (/^G/){exit;} next if !/^E/; print $_;' data E99HEADER\|006\|001 E99INSSCH\|052\| E99POLCOM\|1\|\|IIL\|62\|35119849249024\|\|\|\|\| E99INSFAC2\|C00124\|\|\|\|\|\|XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800\| F98020A@LUS76E8012758 /cygdrive/c/package $ cat data (note: changed the data to have E's after old) Q165HWN0X001 Q165HWN0X002 Q165HWN0X003 E99HEADER\|006\|001 E99INSSCH\|052\| E99POLCOM\|1\|\|IIL\|62\|35119849249024\|\|\|\|\| E99INSFAC2\|C00124\|\|\|\|\|\|XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800\| G35119849249024 h189SMA2 E99INSSCH\|052\| E99POLCOM\|1\|\|IIL\|62\|35119849249024\|\|\|\|\| E99INSFAC2\|C00124\|\|\|\|\|\|XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800\|	[reply]
Re: Reading file and matching lines by kcott (Archbishop) on Feb 12, 2014 at 08:34 UTC
G'day Jalcock501, You asked a very similar question, with a very similar title, using very similar data, in "Search file for certain lines". Here's a cutdown version (with appropriate modifications) of the technique I provided in that thread (Re: Search file for certain lines): `#!/usr/bin/env perl use strict; use warnings; local $/ = "\nh"; print "Block $.\n", /^(E.*?)^G/ms ? $1 : "Error\n" while <DATA>; __DATA__ hblah Qblah Eblock_1_line_1 Eblock_1_line_2 Gblah hblah Qblah Gblah hblah Qblah Eblock_3_line_1 Eblock_3_line_2 Gblah` [download] Output: `Block 1 Eblock_1_line_1 Eblock_1_line_2 Block 2 Error Block 3 Eblock_3_line_1 Eblock_3_line_2` [download] In Re: Search file for certain lines, I provided an explanation of the code as well as links to more detailed documentation. I've introduced no new concepts here: if there's something you don't understand here, go back to the earlier post for more information. -- Ken	[reply] [d/l] [select]
Re^2: Reading file and matching lines by Jalcock501 (Sexton) on Feb 13, 2014 at 15:50 UTC
Hi Kcott I complete forgot about that thread, thank you for reminding me. I do however have a quick question... if I want to check for duplicate G entries within the same scope (i.e between E and h records) how would I do it. I have some example code I tried but it just prints all G records. `my $lines if(/^G/) { next if ($lines eq $_); $lines = $_; print $_; }` [download] here is the example data I'm using `E123456789 G123456798 ignore this as this is the first instance of G record +in scope h12345 E1234567 E7899874 G123456798 even though this is the same ignore as its first insta +nce G123456789 ignore this as it is different from previous G record G123465798 should flag duplicate here because it is the same firs +t G record in scope!!! h1245` [download]	[reply] [d/l] [select]
Re^3: Reading file and matching lines by kcott (Archbishop) on Feb 14, 2014 at 00:33 UTC
Firstly, you have no duplicates in any (of what you're calling) "scope". `G123465798` is not a duplicate of `G123456798`: you've transposed the `5` and the `6`. I've fixed this in the example below. There's a standard idiom for checking for duplicates in this sort of scenario. Use a hash (often called `%seen`) that has as its keys whatever identifier you're checking. While processing, if the key exists, it's a duplicate, so skip/flag/etc. as appropriate; if the key doesn't exist, it's unique, so use it and then add it to the hash (usually done with a postfix increment). Here's an example using your fixed data: `#!/usr/bin/env perl -l use strict; use warnings; my @data = ( [ qw{E123456789 G123456798 h12345} ], [ qw{E1234567 E7899874 G123456798 G123456789 G123456798 h1245} ], ); for my $scope (@data) { my %seen; for my $identifier (@$scope) { print $identifier unless $seen{$identifier}++; } }` [download] Output: `E123456789 G123456798 h12345 E1234567 E7899874 G123456798 G123456789 h1245` [download] -- Ken	[reply] [d/l] [select]


Problems? Is your data what you think it is?
	PerlMonks