Jalcock501 has asked for the wisdom of the Perl Monks concerning the following question:
Good Afternoon my fellow monks! I am need of your assistance, I need to look through a file and read it line by line. And this is the tricky bit (for me at least). Whilst reading the first character on line A which should begin with an E and then search through the lines until I find a line beginning with G. But if we hit a line beginning with h<lower case> we've gone to far and the script should produce an error. Now I though maybe using a FOR loop to loop through the lines one at a time however I've never done this in perl so here was my crack at it: #!/usr/bin/perl
use strict;
my @lines;
my $file = <quoteout.dat>;
open my $in, '<', $file;
open my $out, '>', "ERR";
@lines = split('', $_);
for(my $i; $i < 9; $i++)
{
if($line[$i] eq 'E')
{
#add one until finds a G or h
}
}
UPDATE: I forgot to add the type on data... Q165HWN0X001
Q165HWN0X002
Q165HWN0X003
E99HEADER|006|001
E99INSSCH|052|
E99POLCOM|1||IIL|62|35119849249024|||||
E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800|
G35119849249024
h189SMA2
Could someone help as I'm not sure if this is right.
Thanks
Jim
Re: Reading file and matching lines
by choroba (Cardinal) on Feb 11, 2014 at 12:12 UTC
|
In Perl, you can read a file line by line without the need to load the whole file first. Use the diamond operator in a while loop (untested code):
open my $IN, '<', 'quoteout.dat' or die "$!";
my $searching_for_G;
while (<$IN>) {
$searching_for_G = 1 if 0 == index $_, 'E';
die "Error: h at line $." if $searching_for_G and 0 == index $_, '
+h';
if ($searching_for_G and 0 == index $_, 'G') {
print "Found G at line $.\n";
undef $searching_for_G;
}
}
Note that unlike in C, a string is not an array of characters in Perl (that's why I used index). Also, you did not specify what to do if G is found - should the program end or search for another E? I assumed the latter.
$. contains the input line number. See perlvar for details.
| [reply] [d/l] |
|
Wouldn't it be better if you had just used $searching_for_G and /^h/? (I haven't tested this). It may well just be me, but 0 == index ... sticks out oddly to my eyes.
| [reply] [d/l] [select] |
|
Apologies, Yes there are several instances of this in a single file, so I need to do this through out the file and only report any errors if there are any. If none the script should exit normally.
| [reply] |
|
while (<IN>) {
print qq($1\n) if /^(E)/;
print qq($1\n) if /^(G)/;
die $1 if /^(h)/;
}
Or do i still misunderstand the specs?
Best regards, Karl
«The Crux of the Biscuit is the Apostrophe»
| [reply] [d/l] |
|
use strict;
use warnings;
my $infile = shift;
my $found_E = 0;
my $sets = 0;
open my $ifh, '<', $infile;
while(<$ifh>) {
if (/^E/) {
$found_E = 1;
next;
}
if ($found_E) {
if (/^G/) {
$sets += 1;
$found_E = 0;
next;
}
if (/^h/) {
print "Error! Found h before G\n";
exit;
}
}
}
close($ifh);
printf "Found %d sets from E to G uninterrupted by h\n",$sets;
| [reply] [d/l] |
Re: Reading file and matching lines
by Eily (Monsignor) on Feb 11, 2014 at 16:07 UTC
|
In the name of Tim Toady (There Is More Than One Way To Do It). Featuring the range, or flip-flop operator, which translates in human as "From .. till ..", and the next keyword.
my $count =0;
LINE: while(<DATA>)
{
next LINE unless /^E/../^G/; # next line unless we are between a li
+ne starting with a end a line starting with G
die "Oups, went too far!" if /^h/; # error if the line starts with
+an h and hasn't been skipped by the previous statment
$count++ unless /^G/; # count that do not start with a G
}
__DATA__
Q165HWN0X001
Q165HWN0X002
Q165HWN0X003
E99HEADER|006|001
E99INSSCH|052|
E99POLCOM|1||IIL|62|35119849249024|||||
E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800|
G35119849249024
h189SMA2
| [reply] [d/l] |
|
Prompt> perl -ne ' next if !/^E/; print $_; ' datafile
E99HEADER|006|001
E99INSSCH|052|
E99POLCOM|1||IIL|62|35119849249024|||||
E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800|
| [reply] |
|
F98020A@LUS76E8012758 /cygdrive/c/package
$ perl -ne ' if (/^h/){print "error starts with h program exiting"; exit;} if (/^G/){exit;} next if !/^E/; print $_;' data
E99HEADER|006|001
E99INSSCH|052|
E99POLCOM|1||IIL|62|35119849249024|||||
E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800|
F98020A@LUS76E8012758 /cygdrive/c/package
$ cat data (note: changed the data to have E's after old)
Q165HWN0X001
Q165HWN0X002
Q165HWN0X003
E99HEADER|006|001
E99INSSCH|052|
E99POLCOM|1||IIL|62|35119849249024|||||
E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800|
G35119849249024
h189SMA2
E99INSSCH|052|
E99POLCOM|1||IIL|62|35119849249024|||||
E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800|
| [reply] |
Re: Reading file and matching lines
by kcott (Archbishop) on Feb 12, 2014 at 08:34 UTC
|
#!/usr/bin/env perl
use strict;
use warnings;
local $/ = "\nh";
print "Block $.\n", /^(E.*?)^G/ms ? $1 : "Error\n" while <DATA>;
__DATA__
hblah
Qblah
Eblock_1_line_1
Eblock_1_line_2
Gblah
hblah
Qblah
Gblah
hblah
Qblah
Eblock_3_line_1
Eblock_3_line_2
Gblah
Output:
Block 1
Eblock_1_line_1
Eblock_1_line_2
Block 2
Error
Block 3
Eblock_3_line_1
Eblock_3_line_2
In Re: Search file for certain lines, I provided an explanation of the code as well as links to more detailed documentation.
I've introduced no new concepts here: if there's something you don't understand here, go back to the earlier post for more information.
| [reply] [d/l] [select] |
|
Hi Kcott
I complete forgot about that thread, thank you for reminding me.
I do however have a quick question... if I want to check for duplicate G entries within the same scope (i.e between E and h records) how would I do it.
I have some example code I tried but it just prints all G records.
my $lines
if(/^G/)
{
next if ($lines eq $_);
$lines = $_;
print $_;
}
here is the example data I'm using
E123456789
G123456798 ignore this as this is the first instance of G record
+in scope
h12345
E1234567
E7899874
G123456798 even though this is the same ignore as its first insta
+nce
G123456789 ignore this as it is different from previous G record
G123465798 should flag duplicate here because it is the same firs
+t G record in scope!!!
h1245
| [reply] [d/l] [select] |
|
Firstly, you have no duplicates in any (of what you're calling) "scope".
G123465798 is not a duplicate of G123456798: you've transposed the 5 and the 6.
I've fixed this in the example below.
There's a standard idiom for checking for duplicates in this sort of scenario.
Use a hash (often called %seen) that has as its keys whatever identifier you're checking.
While processing, if the key exists, it's a duplicate, so skip/flag/etc. as appropriate;
if the key doesn't exist, it's unique, so use it and then add it to the hash (usually done with a postfix increment).
Here's an example using your fixed data:
#!/usr/bin/env perl -l
use strict;
use warnings;
my @data = (
[ qw{E123456789 G123456798 h12345} ],
[ qw{E1234567 E7899874 G123456798 G123456789 G123456798 h1245} ],
);
for my $scope (@data) {
my %seen;
for my $identifier (@$scope) {
print $identifier unless $seen{$identifier}++;
}
}
Output:
E123456789
G123456798
h12345
E1234567
E7899874
G123456798
G123456789
h1245
| [reply] [d/l] [select] |
|
|