Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

How do I split a file into separate sections?

by Anonymous Monk
on Apr 03, 2000 at 16:34 UTC ( [id://6718]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a text file that is composed of sections each with the same headers; e.g.;
TI: some lines AU: some lines JN: some lines
The section is then repeated for a number of "hits" What I want to do is grab each section and place it into a hash with the section headers as keys. any ideas? P.S. I'd rather not use the split function

Originally posted as a Categorized Question.

Replies are listed 'Best First'.
Re: How do I split a file into separate sections?
by chromatic (Archbishop) on Apr 03, 2000 at 18:17 UTC
    while ($data =~ s!(TI|AU|JN):(.*)?!!s) { push @{$sections{$1}}, $2; }
    That will build a hash of arrays with the headers as keys. Things you might want to change are as follows:
    • With the /s modifier on the regex, . will match newlines. You may not want this (but it makes the regex much more complicated)
    • You may not want a hash of arrays. Remove the @ and use the .= operator
    • You may want to match a newline after the colon.
    This all depends on your data. Personally, I'd use split in a heartbeat.
Re: How do I split a file into separate sections?
by btrott (Parson) on Apr 03, 2000 at 21:02 UTC
    I'm not sure I understand the structure of your file well enough to answer the question correctly. From what I understand, though, it seems to me that you have a file that looks something like this:
    TI: ... AU: ... JN: ... TI: ... AU: ... JN: ...
    and so on. Is the pattern repeating like that? And so for each section, you have, say, a TI, an AU, and a JN, and those 3 (or however many) headers and content constitute one "section"?

    The solution that chromatic provided will work for a structure like this, but the resulting data structure may not look like you expect. You could dump it out to see quickly enough what it looks like, but I just thought I'd explain quickly.

    You're going to end up with a hash called %sections, where the possible headers in the file (TI, AU, and JN) are the keys, and the values are arrays of all of the lines pertaining to those sections. So, for example, say that the 5th "section" in your file looked like this:

    TI: Foo AU: Bar JN: Baz
    Now you want to get the data for that section. You can access that information like this:
    # the index is 4 because the array index starts # at 0, btw my $ti = $sections{'TI'}[4]; my $au = $sections{'AU'}[4]; my $jn = $sections{'JN'}[4];
    Just in case it needed explaining.
Re: How do I split a file into separate sections?
by buckaduck (Chaplain) on Apr 23, 2001 at 12:59 UTC
    This particular example looks like a WAIS database; you may want to look into the Wais module if that's the case.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://6718]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2024-03-28 15:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found