Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: breaking a text file into a data structure -- best way?

by rubasov (Friar)
on Apr 09, 2010 at 16:59 UTC ( [id://833862]=note: print w/replies, xml ) Need Help??


in reply to breaking a text file into a data structure -- best way?

Let me try to help:

First analyze the structure of your input, name the parts of it. This input consist of lines, each line consists three fields: a prefix, a separator and a text field. Consecutive prefixes form a block and consecutive blocks form a prefix alphabet.

Second, answer this question: is the input parsable line-by-line or you have to look around (at a certain point) to decide what-is-what? The former answer is typically resulting more efficient programs (but it is not possible for all types of input) and the latter is generally easier to code, but requires to hold more of your input in memory. (I decided to choose the line-by-line approach by storing the previous prefix only beyond the current line.)

Then constrain yourself to go through your input line-by-line and ask yourself: what are the states (or state transitions) determining what should I do?

  1. at the start of a new (consecutive) prefix-block
  2. in the middle of a prefix-block
  3. prefix alphabet starting over

How to map these states to relations between lines? By comparing the prefix of the current and the previous line.

What is the tool to express these relations between the lines? Alphabetical comparison. The mapping is (cf. with the previous listing):

  1. $prefix gt $prev_prefix
  2. $prefix eq $prev_prefix
  3. $prefix lt $prev_prefix

What should I do at each state transition?

  1. add $prefix => $text to the current hash
  2. append $text to the current $hash{$prefix}
  3. push a new hash ref to your array: { $prefix => $text }

Now try to write it again and if you're stuck, come back and look at this:

use strict; use warnings; use Data::Dump qw( pp ); my $ref = [ {} ]; my $prev_prefix = ''; while (<DATA>) { my ( $prefix, $text ) = split /> ?/; if ( $prefix gt $prev_prefix ) { $ref->[-1]{$prefix} = $text; } elsif ( $prefix eq $prev_prefix ) { $ref->[-1]{$prefix} .= $text; } else { push @$ref, { $prefix => $text }; } $prev_prefix = $prefix; } pp $ref; __DATA__ a> some random text b> b> a few random b> lines b> b> of more b> random b> b> text c> some more c> c> random c> text c> a> some random text b> b> a few random b> lines b> b> of more b> random b> b> text c> some more c> c> random c> text c>

Of course this is only one approach, but the clearing of concepts, methodical thinking of the mechanical way to solve a problem always helped me.

And in general: practice and practice more. Read books, read the code of others (not just glance over, but change them, understand them), read the problems of others and try to solve them without looking at the solution posted by others.

Cheers

Replies are listed 'Best First'.
Re^2: breaking a text file into a data structure -- best way?
by punkish (Priest) on Apr 10, 2010 at 00:35 UTC
    Thanks for the response, but you misunderstood my task. The 'a>', 'b>', 'c>' are not really present in the text file. I included them as "line numbers" to illustrate where I wanted the text split up. In the specific case I presented, the text is split up at the line *before* the line that starts with '======'.

    In any case, I am curious about a general approach to such problems, and at first glance, it seems that a state machine approach would help me. However, I got stuck with that as well, especially since my splitting markers are not *in* the line where I want to split the text, but *after* the line on which I want to split.

    --

    when small people start casting long shadows, it is time to go to bed
      but you misunderstood my task
      Indeed. In the last days I'm doing really stupid things, sorry.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://833862]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-04-19 07:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found