Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

Reading file and matching lines

by Jalcock501 (Sexton)
on Feb 11, 2014 at 12:02 UTC ( #1074402=perlquestion: print w/replies, xml ) Need Help??
Jalcock501 has asked for the wisdom of the Perl Monks concerning the following question:

Good Afternoon my fellow monks!

I am need of your assistance, I need to look through a file and read it line by line. And this is the tricky bit (for me at least). Whilst reading the first character on line A which should begin with an E and then search through the lines until I find a line beginning with G.

But if we hit a line beginning with h<lower case> we've gone to far and the script should produce an error. Now I though maybe using a FOR loop to loop through the lines one at a time however I've never done this in perl so here was my crack at it:

#!/usr/bin/perl use strict; my @lines; my $file = <quoteout.dat>; open my $in, '<', $file; open my $out, '>', "ERR"; @lines = split('', $_); for(my $i; $i < 9; $i++) { if($line[$i] eq 'E') { #add one until finds a G or h } }

UPDATE: I forgot to add the type on data...

Q165HWN0X001 Q165HWN0X002 Q165HWN0X003 E99HEADER|006|001 E99INSSCH|052| E99POLCOM|1||IIL|62|35119849249024||||| E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800| G35119849249024 h189SMA2

Could someone help as I'm not sure if this is right.



Replies are listed 'Best First'.
Re: Reading file and matching lines
by choroba (Bishop) on Feb 11, 2014 at 12:12 UTC
    In Perl, you can read a file line by line without the need to load the whole file first. Use the diamond operator in a while loop (untested code):
    open my $IN, '<', 'quoteout.dat' or die "$!"; my $searching_for_G; while (<$IN>) { $searching_for_G = 1 if 0 == index $_, 'E'; die "Error: h at line $." if $searching_for_G and 0 == index $_, ' +h'; if ($searching_for_G and 0 == index $_, 'G') { print "Found G at line $.\n"; undef $searching_for_G; } }

    Note that unlike in C, a string is not an array of characters in Perl (that's why I used index). Also, you did not specify what to do if G is found - should the program end or search for another E? I assumed the latter.

    $. contains the input line number. See perlvar for details.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Wouldn't it be better if you had just used $searching_for_G and /^h/? (I haven't tested this). It may well just be me, but 0 == index ... sticks out oddly to my eyes.
      Apologies, Yes there are several instances of this in a single file, so I need to do this through out the file and only report any errors if there are any. If none the script should exit normally.
        "...only report any errors if there are any"

        If so, why not something simple like this:

        while (<IN>) { print qq($1\n) if /^(E)/; print qq($1\n) if /^(G)/; die $1 if /^(h)/; }

        Or do i still misunderstand the specs?

        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

        use strict; use warnings; my $infile = shift; my $found_E = 0; my $sets = 0; open my $ifh, '<', $infile; while(<$ifh>) { if (/^E/) { $found_E = 1; next; } if ($found_E) { if (/^G/) { $sets += 1; $found_E = 0; next; } if (/^h/) { print "Error! Found h before G\n"; exit; } } } close($ifh); printf "Found %d sets from E to G uninterrupted by h\n",$sets;
Re: Reading file and matching lines
by Eily (Prior) on Feb 11, 2014 at 16:07 UTC

    In the name of Tim Toady (There Is More Than One Way To Do It). Featuring the range, or flip-flop operator, which translates in human as "From .. till ..", and the next keyword.

    my $count =0; LINE: while(<DATA>) { next LINE unless /^E/../^G/; # next line unless we are between a li +ne starting with a end a line starting with G die "Oups, went too far!" if /^h/; # error if the line starts with +an h and hasn't been skipped by the previous statment $count++ unless /^G/; # count that do not start with a G } __DATA__ Q165HWN0X001 Q165HWN0X002 Q165HWN0X003 E99HEADER|006|001 E99INSSCH|052| E99POLCOM|1||IIL|62|35119849249024||||| E99INSFAC2|C00124||||||XAJX0727,YGAX0000,ZAAJ0203,VABA0018,WJZA1800| G35119849249024 h189SMA2

      One liner

      Prompt> perl -ne ' next if !/^E/; print $_; ' datafile

        modified one-liner and data to test...

        F98020A@LUS76E8012758 /cygdrive/c/package
        $ perl -ne ' if (/^h/){print "error starts with h program exiting"; exit;} if (/^G/){exit;} next if !/^E/; print $_;' data
        F98020A@LUS76E8012758 /cygdrive/c/package
        $ cat data  (note: changed the data to have E's after old)
Re: Reading file and matching lines
by kcott (Chancellor) on Feb 12, 2014 at 08:34 UTC

    G'day Jalcock501,

    You asked a very similar question, with a very similar title, using very similar data, in "Search file for certain lines".

    Here's a cutdown version (with appropriate modifications) of the technique I provided in that thread (Re: Search file for certain lines):

    #!/usr/bin/env perl use strict; use warnings; local $/ = "\nh"; print "Block $.\n", /^(E.*?)^G/ms ? $1 : "Error\n" while <DATA>; __DATA__ hblah Qblah Eblock_1_line_1 Eblock_1_line_2 Gblah hblah Qblah Gblah hblah Qblah Eblock_3_line_1 Eblock_3_line_2 Gblah


    Block 1 Eblock_1_line_1 Eblock_1_line_2 Block 2 Error Block 3 Eblock_3_line_1 Eblock_3_line_2

    In Re: Search file for certain lines, I provided an explanation of the code as well as links to more detailed documentation. I've introduced no new concepts here: if there's something you don't understand here, go back to the earlier post for more information.

    -- Ken

      Hi Kcott

      I complete forgot about that thread, thank you for reminding me.

      I do however have a quick question... if I want to check for duplicate G entries within the same scope (i.e between E and h records) how would I do it.

      I have some example code I tried but it just prints all G records.

      my $lines if(/^G/) { next if ($lines eq $_); $lines = $_; print $_; }
      here is the example data I'm using
      E123456789 G123456798 ignore this as this is the first instance of G record +in scope h12345 E1234567 E7899874 G123456798 even though this is the same ignore as its first insta +nce G123456789 ignore this as it is different from previous G record G123465798 should flag duplicate here because it is the same firs +t G record in scope!!! h1245

        Firstly, you have no duplicates in any (of what you're calling) "scope". G123465798 is not a duplicate of G123456798: you've transposed the 5 and the 6. I've fixed this in the example below.

        There's a standard idiom for checking for duplicates in this sort of scenario. Use a hash (often called %seen) that has as its keys whatever identifier you're checking. While processing, if the key exists, it's a duplicate, so skip/flag/etc. as appropriate; if the key doesn't exist, it's unique, so use it and then add it to the hash (usually done with a postfix increment).

        Here's an example using your fixed data:

        #!/usr/bin/env perl -l use strict; use warnings; my @data = ( [ qw{E123456789 G123456798 h12345} ], [ qw{E1234567 E7899874 G123456798 G123456789 G123456798 h1245} ], ); for my $scope (@data) { my %seen; for my $identifier (@$scope) { print $identifier unless $seen{$identifier}++; } }


        E123456789 G123456798 h12345 E1234567 E7899874 G123456798 G123456789 h1245

        -- Ken

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1074402]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2018-07-17 00:31 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (353 votes). Check out past polls.