Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

How do I split a file into separate sections?

by Anonymous Monk
on Apr 03, 2000 at 16:34 UTC ( #6718=categorized question: print w/replies, xml ) Need Help??
Contributed by Anonymous Monk on Apr 03, 2000 at 16:34 UTC
Q&A  > regular expressions


I have a text file that is composed of sections each with the same headers; e.g.;
TI: some lines AU: some lines JN: some lines
The section is then repeated for a number of "hits" What I want to do is grab each section and place it into a hash with the section headers as keys. any ideas? P.S. I'd rather not use the split function

Answer: How do I split a file into separate sections?
contributed by chromatic

while ($data =~ s!(TI|AU|JN):(.*)?!!s) { push @{$sections{$1}}, $2; }
That will build a hash of arrays with the headers as keys. Things you might want to change are as follows:
  • With the /s modifier on the regex, . will match newlines. You may not want this (but it makes the regex much more complicated)
  • You may not want a hash of arrays. Remove the @ and use the .= operator
  • You may want to match a newline after the colon.
This all depends on your data. Personally, I'd use split in a heartbeat.
Answer: How do I split a file into separate sections?
contributed by btrott

I'm not sure I understand the structure of your file well enough to answer the question correctly. From what I understand, though, it seems to me that you have a file that looks something like this:

TI: ... AU: ... JN: ... TI: ... AU: ... JN: ...
and so on. Is the pattern repeating like that? And so for each section, you have, say, a TI, an AU, and a JN, and those 3 (or however many) headers and content constitute one "section"?

The solution that chromatic provided will work for a structure like this, but the resulting data structure may not look like you expect. You could dump it out to see quickly enough what it looks like, but I just thought I'd explain quickly.

You're going to end up with a hash called %sections, where the possible headers in the file (TI, AU, and JN) are the keys, and the values are arrays of all of the lines pertaining to those sections. So, for example, say that the 5th "section" in your file looked like this:

TI: Foo AU: Bar JN: Baz
Now you want to get the data for that section. You can access that information like this:
# the index is 4 because the array index starts # at 0, btw my $ti = $sections{'TI'}[4]; my $au = $sections{'AU'}[4]; my $jn = $sections{'JN'}[4];
Just in case it needed explaining.
Answer: How do I split a file into separate sections?
contributed by buckaduck

This particular example looks like a WAIS database; you may want to look into the Wais module if that's the case.

Please (register and) log in if you wish to add an answer

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (6)
    As of 2018-06-18 10:05 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (109 votes). Check out past polls.