http://www.perlmonks.org?node_id=629679


in reply to Remove empty column(s) from unpack template

I haven't looked at your code in depth, but I just wanted to plant an idea in your head. One of the things that used to bug me about Text::Autoformat is that it doesn't handle tabular data too well (I don't know if this is still the case). I've created code similar to BrowerUK's in the past but was unhappy about how often it didn't work quite right. With your additions (assuming they work), it seems like you could parameterize the column finding code such that it can work in a variety of situations. Perhaps even enough that it could be patched into Text::Autoformat :-)

Replies are listed 'Best First'.
Re^2: Remove empty column(s) from unpack template
by TheDamian (Vicar) on Aug 01, 2007 at 20:21 UTC
    I do have plans to add table recognition to Text::Autoformat. Specifically, to port the table recognition code already used in Perl6::Perldoc::Parser. Those following this thread might find that code interesting (search for /Build entire table/).

    Damian

      That's excellent Damian! May the universe align such that all of your plans come to fruition.

Re^2: Remove empty column(s) from unpack template
by BrowserUk (Patriarch) on Jul 30, 2007 at 23:04 UTC

      It's not the output so much as it is just recognizing that the data is in a table. Text::Autoformat first has to parse the paragraphs that it's dealing with before it can decide what to do about them. A heuristic could be developed that says "this chunk of data is a table". Once you've got that, if you're going to reformat it, you've got to know where the columns are. The OP's code may be able to serve both purposes. Maybe. :-)

        Ah. I see what you are getting at. Yes, I think it probably could be used for that purpose.

        Mind you, having just taken an extended squint inside Text::Autoformat, i think it would take a very brave person to try and add table inferencing code with all the other things going on in that module.

        I'm not adverse to making full use of the regex engine, and the regexes in there are nicely laid out and commented. But, trying to combine the heuristic in the OPs code, with the various heuristics already used in that module, and come away with something that worked, even for some fairly specific cases let alone the general case, would be quite an achievement.

        For example, imagine trying to extend the POD example of a quoted email/maillist post that has had it's formatting screwed over by re-quoting. What if the original text contained a table with some left and some right justified fields? Combining the heuristics to extract and reformat that would be quite difficult.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.