Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

How do I split a file into separate sections?

by Anonymous Monk
on Apr 03, 2000 at 16:34 UTC ( #6718=categorized question: print w/replies, xml ) Need Help??
Contributed by Anonymous Monk on Apr 03, 2000 at 16:34 UTC
Q&A  > regular expressions


Description:

I have a text file that is composed of sections each with the same headers; e.g.;
TI: some lines AU: some lines JN: some lines
The section is then repeated for a number of "hits" What I want to do is grab each section and place it into a hash with the section headers as keys. any ideas? P.S. I'd rather not use the split function

Answer: How do I split a file into separate sections?
contributed by chromatic

while ($data =~ s!(TI|AU|JN):(.*)?!!s) { push @{$sections{$1}}, $2; }
That will build a hash of arrays with the headers as keys. Things you might want to change are as follows:
  • With the /s modifier on the regex, . will match newlines. You may not want this (but it makes the regex much more complicated)
  • You may not want a hash of arrays. Remove the @ and use the .= operator
  • You may want to match a newline after the colon.
This all depends on your data. Personally, I'd use split in a heartbeat.
Answer: How do I split a file into separate sections?
contributed by btrott

I'm not sure I understand the structure of your file well enough to answer the question correctly. From what I understand, though, it seems to me that you have a file that looks something like this:

TI: ... AU: ... JN: ... TI: ... AU: ... JN: ...
and so on. Is the pattern repeating like that? And so for each section, you have, say, a TI, an AU, and a JN, and those 3 (or however many) headers and content constitute one "section"?

The solution that chromatic provided will work for a structure like this, but the resulting data structure may not look like you expect. You could dump it out to see quickly enough what it looks like, but I just thought I'd explain quickly.

You're going to end up with a hash called %sections, where the possible headers in the file (TI, AU, and JN) are the keys, and the values are arrays of all of the lines pertaining to those sections. So, for example, say that the 5th "section" in your file looked like this:

TI: Foo AU: Bar JN: Baz
Now you want to get the data for that section. You can access that information like this:
# the index is 4 because the array index starts # at 0, btw my $ti = $sections{'TI'}[4]; my $au = $sections{'AU'}[4]; my $jn = $sections{'JN'}[4];
Just in case it needed explaining.
Answer: How do I split a file into separate sections?
contributed by buckaduck

This particular example looks like a WAIS database; you may want to look into the Wais module if that's the case.

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    [haukex]: But I was just using it as an author test anyway
    [Corion]: haukex: Aaah - I thought you were still running these tests on every machine, but you only run these as author or Devel::Cover tests
    [Corion]: haukex: Yeah, I think back then I used Test::Inline, which used a pod parser that was going through some changes and I didn't want to cater for all the various versions and thus stopped testing the Pod completely
    [choroba]: I usually do this with presentations
    [Corion]: But now I think statically (re)generating the Pod tests is a saner approach, and likely I'll regenerate the tests either in Makefile.PL or from xt/ but have them live below t/
    [choroba]: I keep the snippets in files of their own, and use a Makefile to syntax highlight them and insert them into slides, while also running them and inserting the output if required
    [Corion]: choroba: Ooooh - I didn't think of that! I write my presentations as POD and if it "roughly" looks like Perl code, I should also syntax-check that...
    [haukex]: Yes sorry I don't run them all the time, my POD tests are only run as author tests (and are excluded when I'm using Devel::Cover)
    [Corion]: choroba: Hmm - no, I keep the snippets inline, but as my framework also has support for capturing output etc., maybe I should do the same...
    [Corion]: haukex: Yes, that approach is sane, and it heals the fragility of Pod parsers in a nice way while still syntax-checking stuff

    How do I use this? | Other CB clients
    Other Users?
    Others wandering the Monastery: (6)
    As of 2017-02-27 12:25 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      Before electricity was invented, what was the Electric Eel called?






      Results (385 votes). Check out past polls.