LukeyBoy has asked for the wisdom of the Perl Monks concerning the following question:

Hey all, this my first post to Perlmonks (and unfortunately in the Seekers section :-). Anyway, I have a file that contains a ton of records delimited by the newline character. A sample is:

/users/QaoUP
Luke
Another test
Sat, 21 Apr 2001 15:58:28 -0500
Inbox
2566
0
1
146

So I want to build a list of hashes, so each hash would have a key corresponding to each line in the record (a file key for the first line, name for the second, etc). Obviously I can read a record at a time and manually put things right, but I know that Perl must have a better way. Any ideas? Thanks!

Replies are listed 'Best First'.
Re: Reading structured records from a file
by Masem (Monsignor) on Jan 18, 2002 at 07:57 UTC
    If the file is small enough, then you can read it in all in one big array, and then use very easy array tools to do the rest of the work. This of course assumes that the same number of lines exist in each record.
    my @keys = qw(dir name desc date box byte1 byte2 byte3 byte 4 ); my @loh; my @lines = <FILE>; # in array mode, read the entire file in one go while ( @lines ) { my %hash; @hash{ @keys } = splice @lines, 0, scalar @keys; push @loh, \%hash; shift @lines; #this is the CR line, now gone }

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    "I can see my house from here!"
    It's not what you know, but knowing how to find it if you don't know that's important

      That's perfect, much nicer than the alternative! Thanks Masem!
Re: Reading structured records from a file
by dmmiller2k (Chaplain) on Jan 18, 2002 at 08:35 UTC

    Masem has a very straightforward suggestion, but here is another.

    If your records are somehow grouped, say, with an extra newline after the last line of one record and before the first line of the next, you can set the input record separator, $/ = "\n\n", to read each group at once. Then, for each one, split it on newline and assign each to separate hash keys, to wit:

    $/ = ""; my @keys = qw(dir name desc date box num1 num2 num3 num4 ); my @recs; while (<>) { # read a group of lines my %rec; # split the line at newlines and assign to a hash slice @rec{@keys} = split /\n/; push @recs, \%rec; }
    If you don't have the extra newlines, you can insert them first with a simple one-liner that inserts a newline before any slash that begins a line.

    $ perl -pie 's:^/:\n/:' file

    Update: Corrected $ to @ in @rec{@keys} and added missing trailing : delimiter in one-liner. Also, changed $/ to "", as suggested in this node.

    dmm

    If you GIVE a man a fish you feed him for a day
    But,
    TEACH him to fish and you feed him for a lifetime
      I was thinking just the same as you. But there's some tiny bugs. First, it should be @rec{@keys}. No scalar sigil. Second, having another delimiter than / in a regex often mean that you might forget a trailing delimiter if the last char in the pattern is a slash. That's the case here. You forgot the last colon.

      It could also be worth to mention the special behaviour for $/ = "".

        Whoops!

        Thanks for pointing out those silly errors. You are right about @rec{@keys} -- I actually spotted and corrected that after submitting it for the first time, but I got distracted by something (I was at work) and evidently closed the browser before submitting it again. Ditto the trailing colon in the one-liner.

        Appreciate the catch, however.

        From perlvar:

        input_record_separator HANDLE EXPR
        $INPUT_RECORD_SEPARATOR
        $RS
        $/

        The input record separator, newline by default. Works like awk's RS variable, including treating empty lines as delimiters if set to the null string. (Note: An empty line cannot contain any spaces or tabs.) You may set it to a multi-character string to match a multi-character delimiter, or to undef to read to end of file. Note that setting it to "\n\n" means something slightly different than setting it to "", if the file contains consecutive empty lines. Setting it to "" will treat two or more consecutive empty lines as a single empty line. Setting it to "\n\n" will blindly assume that the next input character belongs to the next paragraph, even if it's a newline. (Mnemonic: / is used to delimit line boundaries when quoting poetry.)

        Update: emphasis added.

        dmm

Re: Reading structured records from a file
by JojoLinkyBob (Scribe) on Jan 18, 2002 at 09:08 UTC
    Just curious, are you using a Perl script to create the initial text file?

    If so, I'd seriously recommend using tie & untie.

    Then you could store/load the hash off directly to/from a file.
    =~Desertcoder

      I wish the data was Perl generated, tie and untie would save a lot of headaches. The data format has been stabilised and won't change, so I'm SOL there :-)