|Problems? Is your data what you think it is?|
breaking a text file into a data structure -- best way?by punkish (Priest)
|on Apr 09, 2010 at 14:29 UTC||Need Help??|
punkish has asked for the
wisdom of the Perl Monks concerning the following question:
Update0: My Best buddy tells me such class of problems are called "State Machine." Googling for "Perl state machine" returns a bunch of hits that I am now in the process of digesting. In the meantime, I look to your help.
I have a longish text file like below. The gutter annotation is not a part of the text file, but only to aid my question.
I want to split the file into an array of hashes like so
In other words, each hash is made up of the snippet of text starting from the line that is followed by '--------------' up to, but not including, the next line that is followed by '--------------'.
I have two questions -- one, how do I do the above? I have been hitting my head against a wall the entire day yesterday, so I come to you today. I have nothing to show you because I everything I did was wrong. My approach was mostly to start from the beginning and go to the end, trying to keep flags on when one hash element began and when it ended, and so on. Which brings me to my second question.
What is the canonical design pattern for such a problem? I come across such problems all the time, and I always slow down in trying to solve them. A pattern that is visible to the eye becomes very difficult to program. Yesterday I had another such problem which I managed to solve, if I may say so myself, rather innovatively. The text file looked like so
The above had to be converted to
That is, group the brightness values by color triplets. After struggling with it for a while with the usual, line by line, flag as you go approach, I decided to turn the color triplets into keys of a hash. The problem was solved in a couple of lines, and elegantly. Here is the code for that
I was able to solve above because of the uniqueness requirement, else it would have been the usual slog. So, is there a generic approach to this? And, is there a way I can validate the output... ensure that the output is what I really want, given very long input text files?
when small people start casting long shadows, it is time to go to bed