http://www.perlmonks.org?node_id=1074402

Jalcock501 has asked for the wisdom of the Perl Monks concerning the following question:

Good Afternoon my fellow monks!

I am need of your assistance, I need to look through a file and read it line by line. And this is the tricky bit (for me at least). Whilst reading the first character on line A which should begin with an E and then search through the lines until I find a line beginning with G.

But if we hit a line beginning with h<lower case> we've gone to far and the script should produce an error. Now I though maybe using a FOR loop to loop through the lines one at a time however I've never done this in perl so here was my crack at it:

#!/usr/bin/perl use strict; my @lines; my $file = <quoteout.dat>; open my $in, '<', $file; open my $out, '>', "ERR"; @lines = split('', $_); for(my $i; $i < 9; $i++) { if($line[$i] eq 'E') { #add one until finds a G or h } }

UPDATE: I forgot to add the type on data...

Q165HWN0X001 Q165HWN0X002 Q165HWN0X003 E99HEADER|006|001 E99INSSCH|052| E99POLCOM|1||IIL|62|35119849249024||||| E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800| G35119849249024 h189SMA2

Could someone help as I'm not sure if this is right.

Thanks

Jim

Replies are listed 'Best First'.
Re: Reading file and matching lines
by choroba (Cardinal) on Feb 11, 2014 at 12:12 UTC
    In Perl, you can read a file line by line without the need to load the whole file first. Use the diamond operator in a while loop (untested code):
    open my $IN, '<', 'quoteout.dat' or die "$!"; my $searching_for_G; while (<$IN>) { $searching_for_G = 1 if 0 == index $_, 'E'; die "Error: h at line $." if $searching_for_G and 0 == index $_, ' +h'; if ($searching_for_G and 0 == index $_, 'G') { print "Found G at line $.\n"; undef $searching_for_G; } }

    Note that unlike in C, a string is not an array of characters in Perl (that's why I used index). Also, you did not specify what to do if G is found - should the program end or search for another E? I assumed the latter.

    $. contains the input line number. See perlvar for details.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Wouldn't it be better if you had just used $searching_for_G and /^h/? (I haven't tested this). It may well just be me, but 0 == index ... sticks out oddly to my eyes.
      Apologies, Yes there are several instances of this in a single file, so I need to do this through out the file and only report any errors if there are any. If none the script should exit normally.
        "...only report any errors if there are any"

        If so, why not something simple like this:

        while (<IN>) { print qq($1\n) if /^(E)/; print qq($1\n) if /^(G)/; die $1 if /^(h)/; }

        Or do i still misunderstand the specs?

        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

        use strict; use warnings; my $infile = shift; my $found_E = 0; my $sets = 0; open my $ifh, '<', $infile; while(<$ifh>) { if (/^E/) { $found_E = 1; next; } if ($found_E) { if (/^G/) { $sets += 1; $found_E = 0; next; } if (/^h/) { print "Error! Found h before G\n"; exit; } } } close($ifh); printf "Found %d sets from E to G uninterrupted by h\n",$sets;
Re: Reading file and matching lines
by kcott (Archbishop) on Feb 12, 2014 at 08:34 UTC

    G'day Jalcock501,

    You asked a very similar question, with a very similar title, using very similar data, in "Search file for certain lines".

    Here's a cutdown version (with appropriate modifications) of the technique I provided in that thread (Re: Search file for certain lines):

    #!/usr/bin/env perl use strict; use warnings; local $/ = "\nh"; print "Block $.\n", /^(E.*?)^G/ms ? $1 : "Error\n" while <DATA>; __DATA__ hblah Qblah Eblock_1_line_1 Eblock_1_line_2 Gblah hblah Qblah Gblah hblah Qblah Eblock_3_line_1 Eblock_3_line_2 Gblah

    Output:

    Block 1 Eblock_1_line_1 Eblock_1_line_2 Block 2 Error Block 3 Eblock_3_line_1 Eblock_3_line_2

    In Re: Search file for certain lines, I provided an explanation of the code as well as links to more detailed documentation. I've introduced no new concepts here: if there's something you don't understand here, go back to the earlier post for more information.

    -- Ken

      Hi Kcott

      I complete forgot about that thread, thank you for reminding me.

      I do however have a quick question... if I want to check for duplicate G entries within the same scope (i.e between E and h records) how would I do it.

      I have some example code I tried but it just prints all G records.

      my $lines if(/^G/) { next if ($lines eq $_); $lines = $_; print $_; }
      here is the example data I'm using
      E123456789 G123456798 ignore this as this is the first instance of G record +in scope h12345 E1234567 E7899874 G123456798 even though this is the same ignore as its first insta +nce G123456789 ignore this as it is different from previous G record G123465798 should flag duplicate here because it is the same firs +t G record in scope!!! h1245

        Firstly, you have no duplicates in any (of what you're calling) "scope". G123465798 is not a duplicate of G123456798: you've transposed the 5 and the 6. I've fixed this in the example below.

        There's a standard idiom for checking for duplicates in this sort of scenario. Use a hash (often called %seen) that has as its keys whatever identifier you're checking. While processing, if the key exists, it's a duplicate, so skip/flag/etc. as appropriate; if the key doesn't exist, it's unique, so use it and then add it to the hash (usually done with a postfix increment).

        Here's an example using your fixed data:

        #!/usr/bin/env perl -l use strict; use warnings; my @data = ( [ qw{E123456789 G123456798 h12345} ], [ qw{E1234567 E7899874 G123456798 G123456789 G123456798 h1245} ], ); for my $scope (@data) { my %seen; for my $identifier (@$scope) { print $identifier unless $seen{$identifier}++; } }

        Output:

        E123456789 G123456798 h12345 E1234567 E7899874 G123456798 G123456789 h1245

        -- Ken

Re: Reading file and matching lines
by Eily (Monsignor) on Feb 11, 2014 at 16:07 UTC

    In the name of Tim Toady (There Is More Than One Way To Do It). Featuring the range, or flip-flop operator, which translates in human as "From .. till ..", and the next keyword.

    my $count =0; LINE: while(<DATA>) { next LINE unless /^E/../^G/; # next line unless we are between a li +ne starting with a end a line starting with G die "Oups, went too far!" if /^h/; # error if the line starts with +an h and hasn't been skipped by the previous statment $count++ unless /^G/; # count that do not start with a G } __DATA__ Q165HWN0X001 Q165HWN0X002 Q165HWN0X003 E99HEADER|006|001 E99INSSCH|052| E99POLCOM|1||IIL|62|35119849249024||||| E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800| G35119849249024 h189SMA2

      One liner

      Prompt> perl -ne ' next if !/^E/; print $_; ' datafile
      
      E99HEADER|006|001
      E99INSSCH|052|
      E99POLCOM|1||IIL|62|35119849249024|||||
      E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800|
      

        modified one-liner and data to test...

        F98020A@LUS76E8012758 /cygdrive/c/package
        $ perl -ne ' if (/^h/){print "error starts with h program exiting"; exit;} if (/^G/){exit;} next if !/^E/; print $_;' data
        E99HEADER|006|001
        E99INSSCH|052|
        E99POLCOM|1||IIL|62|35119849249024|||||
        E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800|
        
        F98020A@LUS76E8012758 /cygdrive/c/package
        $ cat data  (note: changed the data to have E's after old)
        Q165HWN0X001
        Q165HWN0X002
        Q165HWN0X003
        E99HEADER|006|001
        E99INSSCH|052|
        E99POLCOM|1||IIL|62|35119849249024|||||
        E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800|
        G35119849249024
        h189SMA2
        E99INSSCH|052|
        E99POLCOM|1||IIL|62|35119849249024|||||
        E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800|