Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Let me try to help:

First analyze the structure of your input, name the parts of it. This input consist of lines, each line consists three fields: a prefix, a separator and a text field. Consecutive prefixes form a block and consecutive blocks form a prefix alphabet.

Second, answer this question: is the input parsable line-by-line or you have to look around (at a certain point) to decide what-is-what? The former answer is typically resulting more efficient programs (but it is not possible for all types of input) and the latter is generally easier to code, but requires to hold more of your input in memory. (I decided to choose the line-by-line approach by storing the previous prefix only beyond the current line.)

Then constrain yourself to go through your input line-by-line and ask yourself: what are the states (or state transitions) determining what should I do?

  1. at the start of a new (consecutive) prefix-block
  2. in the middle of a prefix-block
  3. prefix alphabet starting over

How to map these states to relations between lines? By comparing the prefix of the current and the previous line.

What is the tool to express these relations between the lines? Alphabetical comparison. The mapping is (cf. with the previous listing):

  1. $prefix gt $prev_prefix
  2. $prefix eq $prev_prefix
  3. $prefix lt $prev_prefix

What should I do at each state transition?

  1. add $prefix => $text to the current hash
  2. append $text to the current $hash{$prefix}
  3. push a new hash ref to your array: { $prefix => $text }

Now try to write it again and if you're stuck, come back and look at this:

use strict; use warnings; use Data::Dump qw( pp ); my $ref = [ {} ]; my $prev_prefix = ''; while (<DATA>) { my ( $prefix, $text ) = split /> ?/; if ( $prefix gt $prev_prefix ) { $ref->[-1]{$prefix} = $text; } elsif ( $prefix eq $prev_prefix ) { $ref->[-1]{$prefix} .= $text; } else { push @$ref, { $prefix => $text }; } $prev_prefix = $prefix; } pp $ref; __DATA__ a> some random text b> b> a few random b> lines b> b> of more b> random b> b> text c> some more c> c> random c> text c> a> some random text b> b> a few random b> lines b> b> of more b> random b> b> text c> some more c> c> random c> text c>

Of course this is only one approach, but the clearing of concepts, methodical thinking of the mechanical way to solve a problem always helped me.

And in general: practice and practice more. Read books, read the code of others (not just glance over, but change them, understand them), read the problems of others and try to solve them without looking at the solution posted by others.

Cheers

In reply to Re: breaking a text file into a data structure -- best way? by rubasov
in thread breaking a text file into a data structure -- best way? by punkish

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others perusing the Monastery: (13)
    As of 2014-07-24 20:03 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:









      Results (166 votes), past polls