http://www.perlmonks.org?node_id=1015495


in reply to Re: Is there a module for object-oriented substring handling/substitution?
in thread Is there a module for object-oriented substring handling/substitution?

roboticus:

Well, there have been several times in the past I would have found a module like this useful. My current use-case, which led to me writing this thread, is updating table values in a wiki page by programmatically editing the page's source code (which is available in MediaWiki format).

More precisely, the problem at hand is like this:

Within a wiki page, there is a special section (identified by it's section header). This section in turn can have an arbitrary number of subsection (each with a unique subsection header). Each of these subsections contains, among other things, a special table.
The Perl script is supposed to update the values in a specific column in each of these tables (identified by the word in the column's header cell). Which value goes into a particular table cell in that column, depends on the corresponding value in the first column (i.e. the ID column), as well as the title of the subsection that the table belongs to.

Now, the Perl script should play nice with human editing of the same wiki page. Humans will fill in the remaining columns of the aforementioned tables, as well as the rest of the wiki page, and may freely add formatting, move things (like table rows and columns) around, etc.
The Perl script must not touch *anything* on that wiki page except for the specific values it substitutes for new values. This also means no whitespace or formatting changes, so using a generic wiki text parser and dumper is out of the question.

Last but not least, the solution should be elegant and easy to maintain and expand. For example if the wiki page is radically re-factored so that the script breaks, I want to be able to fix the script easily (even if I haven't looked at its Perl source code for months), i.e. without having to write complex five-line regexes from scratch. And in the future I might want to add support for automatically adding new table rows if expected values in the ID column were not found in one of the tables, and things like that - so the design should be flexible enough to account for that.

In the absence of a module like I described in the OP, I would be using s/.../CODE/e blocks for this, but as I hinted in the OP, this might not provide the desired maintainability and elegance.

  • Comment on Re^2: Is there a module for object-oriented substring handling/substitution?
  • Download Code

Replies are listed 'Best First'.
Re^3: Is there a module for object-oriented substring handling/substitution?
by roboticus (Chancellor) on Jan 26, 2013 at 17:25 UTC

    smis:

    Ok, now I understand what you're asking for. I had a slightly different model in mind.

    So you're looking for the ability to do something like:

    # X is regex stuff to detect start of "interesting region", Y detects +end if ($clob =~ /(.*)(X.*Y)(.*)/) { my ($stuff_before, $stuff_to_edit, $stuff_after) = ($1, $2, $3); $stuff_to_edit =~ s/foo/bar/g; $clob = $stuff_before . $stuff_to_edit . $stuff_after; }

    But without all the gymnastics of dismantling and rebuilding the string. I can see where that would be pretty nice since a large $clob would force you to double the storage space and the associated string manipulations.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Yes, I think that's a good way to look at it.

Re^3: Is there a module for object-oriented substring handling/substitution?
by LanX (Saint) on Jan 26, 2013 at 20:31 UTC
    OK now with the knowledge of your use case, I'd rather recommend to work with a document tree representing your wiki page as a hash of hashes.

    Much like a DOM-tree, you could traverse it for whatever markup-element ("table") you want.

    Parse the wiki-page into a tree, manipulate the tree and rebuild the page again.

    Otherwise:

    If you insist to stick persistent meta-informations to ranges of characters, then you should better work with arrays of characters. You could tie or bless the scalar elements with whatever info you want. If your user inserts or deletes anything from the array your metainfos will move accordingly.

    And if you wanna go the full "emacs way" you need to realize linked lists. The easiest way is having 2 element arrays  [$value,$successor_ref]

    EDIT:

    After some meditation, IMHO if you need full interactivity, better stay with the AoH with the document tree, and a "cursor" pointing to the current element. Whenever the user does insert characters update the tree at the point the cursor points to.

    You'll also need to store informations like "parent", "child", "nextSibling" ...

    Have a look at DOM or XML modules at CPAN for inspiration.

    Cheers Rolf

      I seemed to have caused some confusion with my previous post (sorry about that):

      When I wrote about playing nice with page edits by humans, I did not mean actual interactivity while the script is running.
      The script performs a self-contained, non-interactive operation: Pull the page source into a string, update specific values, submit the updated page source back to the server, exit.
      The human editing happens in between multiple such runs of the script, on the wiki itself.

        > I did not mean actual interactivity while the script is running.

        So my first statement still holds, you need to parse the content into a tree, manipulate some nodes and export again as markup.

        Wiki-syntax is nested, e.g. a table entry can be bold or a link!

        No "object-oriented substrings" needed!

        Cheers Rolf