http://www.perlmonks.org?node_id=944393

rajkrishna89 has asked for the wisdom of the Perl Monks concerning the following question:

I got multiple of documents in which i have to extract a line coming after a ID..The scenario is

Customer ID: none VT : 002/89 Customer ID: Yes VT: 001/89 Customer ID: none VT: 006/85 Customer ID: Yes VT: 003/56

I have to extract the id which is coming after YES..The output should be

VT: 001/89 VT: 003/56

i have written a script which is throwing up errors:

my $in_file = 'check.doc'; open my $in_fh, '<', $in_file or die "Could not open file $in_file: $! +"; my $out_file = 'output.txt'; open my $out_fh, '>', $out_file or die "Could not open file $out_file: + $!"; while ( my $line = <$in_fh> ) { chomp ($_); $line if ($_ =~ /^Customer ID: Yes/); print "$out_fh}"; } close $in_fh or die "Could not close file $in_file: $!"; close $out_fh or die "Could not close file $out_file: $!";

pls rectify my mistakes

Optional- I want to know the following : 1. currently im opening for a single file how can i do it for multiple files of the same extension?

2.How to write the output to an excel file?

Replies are listed 'Best First'.
Re: Perl script to print Next line after Pattern Matching
by ww (Archbishop) on Dec 20, 2011 at 15:28 UTC
    Lines 3 and 6 strongly suggest you've copied code from a source which indicates a line continuation with a plus sign. So the first bit of advice would be -- understand what you're borrowing.

    Then, the logic of your script fails because your spec says to capture the contents of two lines, separated by a newline.

    The direct way of dealing with that, if your representation of the data is correct, is probably to read the data in paragraph mode (perldoc -q paragraph, 2nd item, citing perlfaq5) and search using a regex that allows an embedded newline (any decent regex tut).

    Or, without setting the read mode to paragraph, you could read a line, cache it (as say $line) and immediately read another line into $line2; then test line 1 for the "Yes" and if there's a match, print both to your out.file, clear both vars and repeat while your source file has more data in it. Effectively, this is merely a simple-minded (and more verbose) variant on the first suggestion.

    Alternately -- but in the same vein -- you could cache each line where "ID: Yes" is true; read the next line; and print that to your out.file. Be warned, the logic of doing that is not simple: Among other things, you'll need to get the program flow back to the main data input after saving a match.

    Also, there's no reason for allowing potential confusion by using both $line and $_as you do: if my $line = <$in_fh> then use the named var, thereafter.

    And I'm baffled as to how, where, or why you've come up with line 10; this may merely be another ww blind-spot, but I suspect it's a case of where you tried to make something up and hoped the computer would understand. As a general rule, that won't work.

    Oddly (to me, anyway) your existing code passes a compile test under 5.12 So long as the pragmas strict and warnings are NOT in the code. However, when you use those to have perl help you correct your mis-steps; it's quite a different story. Use strict and warnings until you're so sure of yourself you can be positive you need to NOT use them.

    And re your "Optional" question -- All answers here are optional Whether or not the Monks exercise the option of answering depends, in part, on how carefully you've written up your dilemma. When you're inclined to write (as you did) "i have written a script which is throwing up errors, we expect you to post the error(s and warnings), verbaitim, inside <code>...</code tags... and perhaps, if any, even a sample of any other output you are getting.

      Is it yet possible to define paragraph boundaries containing possible spaces|tabs, not just newline(s) -- or as a regex, more generally -- in any current perl 5 version?
Re: Perl script to print Next line after Pattern Matching
by mr.nick (Chaplain) on Dec 20, 2011 at 13:45 UTC
    Quick'n'dirty:
    use strict; use warnings; for my $file (<*.doc>) { open my $fh, "<", $file or die $!; while (<$fh>) { print scalar <$fh> if /^Customer ID: Yes/; } }

    For #2, you can write out a tab delimited file named "output.xls" and Excel will automatically convert it upon load.

    mr.nick ...

      Also quick'n'dirty:

      use strict; use warnings; @ARGV = <*.doc>; while ( <> ) { print scalar <> if /^Customer ID: Yes/; }
        I recognize that. So the point is to loop over the file handle; extract the next part when the current part is interesting.
Re: Perl script to print Next line after Pattern Matching
by NetWallah (Canon) on Dec 20, 2011 at 14:41 UTC
    Since you had requested it, here is a corrected version of your program:
    use strict; use warnings; my $in_file = 'check.doc'; open my $in_fh, '<', $in_file or die "Could not open file $in_file: $! +"; my $out_file = 'output.txt'; open my $out_fh, '>', $out_file or die "Could not open file $out_file: +$!"; my $print_next = 0; while ( my $line = <$in_fh> ) { if ($print_next){ print $out_fh $line; #No need to chomp - we print the "\n" } $print_next = ($line =~ /^Customer ID: Yes/); } close $in_fh or die "Could not close file $in_file: $!"; close $out_fh or die "Could not close file $out_file: $!";
    You can add the other features using code that is posted above.

    BTW, congrats on using the "correct" form of "open" and "close".

                "XML is like violence: if it doesn't solve your problem, use more."

Re: Perl script to print Next line after Pattern Matching
by si_lence (Deacon) on Dec 20, 2011 at 14:20 UTC
    Not as compact as the answer of mr.nick. This might be an advantage or not, you decide ..
    use strict; use warnings; my $print_flag = 'N'; while (<DATA>) { if ($print_flag eq 'Y') { print; $print_flag = 'N'; } $print_flag = 'Y' if m/^Customer ID: Yes/; } __DATA__ Customer ID: none VT : 002/89 Customer ID: Yes VT: 001/89 Customer ID: none VT: 006/85 Customer ID: Yes VT: 003/56
Re: Perl script to print Next line after Pattern Matching
by mrguy123 (Hermit) on Dec 20, 2011 at 15:14 UTC
    The option to save the output file with xls suffix will work, but this is a good opportunity to get familiar with Spreadsheet::WriteExcel which is an excellent Perl module.
    Good luck
    MrGuy
Re: Perl script to print Next line after Pattern Matching
by educated_foo (Vicar) on Dec 20, 2011 at 17:03 UTC
    $/="\n\n";
    will probably help you solve your problem.
    use strict; use warnings;
    will probably help you placate the Monks.
Re: Perl script to print Next line after Pattern Matching
by vt220 (Scribe) on Dec 20, 2011 at 16:33 UTC

    You seem to flip-flop whether or not you are going to use "$_" or "$line" inside the while loop. The data is read into variable "$line" but you chomp "$_". The "if" statement is doing nothing with "$line" and neither is named on the print statement.

    In your case, I would follow the previous advice of producing a tab delimited file for import into Excel.

Re: Perl script to print Next line after Pattern Matching
by Anonymous Monk on Dec 20, 2011 at 18:32 UTC
    This should do it.
    #!/usr/bin/perl use strict; local $/="\n\n"; while (<DATA>) { print "$1\n" if /Customer\sID:\sYes\n(.*)/; } __DATA__ Customer ID: none VT : 002/89 Customer ID: Yes VT: 001/89 Customer ID: none VT: 006/85 Customer ID: Yes VT: 003/56