http://www.perlmonks.org?node_id=1015251


in reply to Is there a module for object-oriented substring handling/substitution?

smis:

Sorry, I don't know of a module that does that. Just for my own curiosity....what would you use something like that for? I can't think of a reason I would want something like that, so I can't suggest any possible searches to help.

...roboticus

When your only tool is a hammer, all problems look like your thumb.

  • Comment on Re: Is there a module for object-oriented substring handling/substitution?

Replies are listed 'Best First'.
Re^2: Is there a module for object-oriented substring handling/substitution?
by smls (Friar) on Jan 26, 2013 at 15:14 UTC

    roboticus:

    Well, there have been several times in the past I would have found a module like this useful. My current use-case, which led to me writing this thread, is updating table values in a wiki page by programmatically editing the page's source code (which is available in MediaWiki format).

    More precisely, the problem at hand is like this:

    Within a wiki page, there is a special section (identified by it's section header). This section in turn can have an arbitrary number of subsection (each with a unique subsection header). Each of these subsections contains, among other things, a special table.
    The Perl script is supposed to update the values in a specific column in each of these tables (identified by the word in the column's header cell). Which value goes into a particular table cell in that column, depends on the corresponding value in the first column (i.e. the ID column), as well as the title of the subsection that the table belongs to.

    Now, the Perl script should play nice with human editing of the same wiki page. Humans will fill in the remaining columns of the aforementioned tables, as well as the rest of the wiki page, and may freely add formatting, move things (like table rows and columns) around, etc.
    The Perl script must not touch *anything* on that wiki page except for the specific values it substitutes for new values. This also means no whitespace or formatting changes, so using a generic wiki text parser and dumper is out of the question.

    Last but not least, the solution should be elegant and easy to maintain and expand. For example if the wiki page is radically re-factored so that the script breaks, I want to be able to fix the script easily (even if I haven't looked at its Perl source code for months), i.e. without having to write complex five-line regexes from scratch. And in the future I might want to add support for automatically adding new table rows if expected values in the ID column were not found in one of the tables, and things like that - so the design should be flexible enough to account for that.

    In the absence of a module like I described in the OP, I would be using s/.../CODE/e blocks for this, but as I hinted in the OP, this might not provide the desired maintainability and elegance.

      smis:

      Ok, now I understand what you're asking for. I had a slightly different model in mind.

      So you're looking for the ability to do something like:

      # X is regex stuff to detect start of "interesting region", Y detects +end if ($clob =~ /(.*)(X.*Y)(.*)/) { my ($stuff_before, $stuff_to_edit, $stuff_after) = ($1, $2, $3); $stuff_to_edit =~ s/foo/bar/g; $clob = $stuff_before . $stuff_to_edit . $stuff_after; }

      But without all the gymnastics of dismantling and rebuilding the string. I can see where that would be pretty nice since a large $clob would force you to double the storage space and the associated string manipulations.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

        Yes, I think that's a good way to look at it.

      OK now with the knowledge of your use case, I'd rather recommend to work with a document tree representing your wiki page as a hash of hashes.

      Much like a DOM-tree, you could traverse it for whatever markup-element ("table") you want.

      Parse the wiki-page into a tree, manipulate the tree and rebuild the page again.

      Otherwise:

      If you insist to stick persistent meta-informations to ranges of characters, then you should better work with arrays of characters. You could tie or bless the scalar elements with whatever info you want. If your user inserts or deletes anything from the array your metainfos will move accordingly.

      And if you wanna go the full "emacs way" you need to realize linked lists. The easiest way is having 2 element arrays  [$value,$successor_ref]

      EDIT:

      After some meditation, IMHO if you need full interactivity, better stay with the AoH with the document tree, and a "cursor" pointing to the current element. Whenever the user does insert characters update the tree at the point the cursor points to.

      You'll also need to store informations like "parent", "child", "nextSibling" ...

      Have a look at DOM or XML modules at CPAN for inspiration.

      Cheers Rolf

        I seemed to have caused some confusion with my previous post (sorry about that):

        When I wrote about playing nice with page edits by humans, I did not mean actual interactivity while the script is running.
        The script performs a self-contained, non-interactive operation: Pull the page source into a string, update specific values, submit the updated page source back to the server, exit.
        The human editing happens in between multiple such runs of the script, on the wiki itself.

Re^2: Is there a module for object-oriented substring handling/substitution?
by Anonymous Monk on Jan 25, 2013 at 03:06 UTC

      Ah, thanks for that link! It lead to a nice hour or so of reading & thinking.

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

      Anonymous Monk:

      I don't care much for speed, I care about convenience and elegance.

      Regarding the use-case of creating editors, they might prefer to use their own special-purpose class for performance reasons, with integrated support for efficient feedback on state changes of tracked ranges. For example, the Kate editor (coded in C++ with Qt and KDE libs) uses a light-weight class called MovingRange for keeping track of persistent ranges withing an opened document. They created this solution from scratch in 2010, dropping their previously used, more generic framework called "SmartRanges", in part due to performance reasons (see blog post).
      Now, if this is a performance-critical code path that benefits from special-case optimization even in an editor written in C++, it probably will be even more so in an editor written in Perl...

      For my purposes, notification about state changes is not needed, nor is performance a critical consideration so having a full substring class that stores a copy of its text (rather than just a thin "range" class pointing to a location withing the parent string) should not be a problem.