Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Parsing Multiple Lines.

by /dev/trash (Curate)
on May 24, 2004 at 00:57 UTC ( [id://355801] : perlquestion . print w/replies, xml ) Need Help??

/dev/trash has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file, that I am trying to parse. Each "record" starts with filename.jpg: followed by a blank line or filename.jpg: followed by multiple lines of data. What I want to do is take the info that is associated with a filename and keep it in a variable to work with later.
This is what I have so far:
#!/usr/bin/perl use warnings; use strict; my $fh; open($fh, "</home/me/bulk.txt") or die "Can't open: $!"; while (my $line = <$fh>) # was $line { if ($line=~/(jpg:\Z)/) { print "\n"; print $1; if ($line=~/(\w)/) { print "\n"; print $line; print "\n"; } } }
The part that I am stuck with is after finding the *.jpg filename I want to check to see if the next line is a blank or has data. This is an example of the text file.
rib.jpg: May.jpg: Camera-Specific Properties: Equipment Make: OLYMPUS OPTICAL CO.,LTD Camera Model: C860L,D360L Camera Software: OLYMPUS CAMEDIA Master Maximum Lens Aperture: f/2.8 Image-Specific Properties: Image Orientation: Top, Left-Hand Horizontal Resolution: 72 dpi Vertical Resolution: 72 dpi Image Created: 2001:04:13 23:59:14 Exposure Time: 1/11 sec F-Number: f/2.8 Exposure Program: Normal Program ISO Speed Rating: 500 Exposure Bias: 1/2 EV Metering Mode: Pattern Light Source: Fluorescent Flash: Flash Focal Length: 5.50 mm Color Space Information: sRGB Image Width: 228 Image Height: 380 Compression Setting: SQ Macro Mode: Normal oher.jpg:

Replies are listed 'Best First'.
Re: Parsing Multiple Lines.
by Zaxo (Archbishop) on May 24, 2004 at 01:11 UTC

    Your routine already distinguishea between data lines and blank (well, nonword) ones. You just haven't used the information. Add an else clause to the end of the if ($line=~/(\w)/) { statement, like:

    } else { print "This line intentionally left blank.\n"; }
    For other ideas, you could chomp and then test length, or else test for not matching non-whitespace: $line !~ /\S/. Each suggestion accomodates a little different notion of which lines are considered blank.

    You probably mean to print whole lines, rather than just what you captured ($1).

    After Compline,

      After posting my question, I did one more search and came up with this reply to a question: Re: multi-line regex match quest It works to a point but I get this:
      Use of uninitialized value in pattern match (m//) at line 16, + <$fh> line 731.

        Use of uninitialized value in pattern match (m//) at line 16, <$fh> line 731

        A warning that tells you the variable you perform a pattern match was empty at the point given
        (line 16 in your code and line 731 in the file you're reading from.)

Re: Parsing Multiple Lines.
by NetWallah (Canon) on May 24, 2004 at 03:59 UTC
    You have declared the filehandle used in "open" (my $fh) - That makes $fh a Symbolic reference to the file handle, and I don't believe you are trying to do that - more likely, this is a result of misunderstanding the statement in the doc:
    If FILEHANDLE is an undefined lexical (my) variable the variable is assigned a reference to a new anonymous filehandle....

    Juse use an UNDEFINED name like FH (No dollar), and you'll be OK.
    Update: OK - seems like I need to re-read the docs myself. See notes below.

    Offense, like beauty, is in the eye of the beholder, and a fantasy.
    By guaranteeing freedom of expression, the First Amendment also guarntees offense.

      Bzzzt. Not a symref.

      $ perl -e'my $fh;open($fh, "< foo") or die $!; print "$fh"' GLOB(0x804b3f8)$
      Nothing wrong with OP's lexical filehandle. It is good practice to localize a global handle such as you recommend within some scope. Then you don't need to worry about name uniqueness.

      After Compline,

      Er, no, open(my $fh, "...") is correct usage. $fh is autovivified into an actual filehandle (not just a symbolic reference). It is the preferred method for opening a filehandle without clobbering an existing one. The excerpt you are referring to does mean an undefined scalar variable ($fh), not a bareword (FH).

      perldoc perlopentut provides several examples of this in the Indirect Filehandles section.