Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

I have sometimes thought about generic regular expressions that could work on any sequence, not just a sequence of characters (useful for writing correlation rules, etc.). Today it also struck me that it should be possible to apply the same concepts to more than one dimension. I started to address it as a toy project, began thinking through what it would really mean, and decided it was getting a bit deep, to say the least. So I thought I might use google to find what else is out there. This little short discussion was the top hit. And though I'm a perl fan, I wasn't even thinking about or looking for anything related to perl.

Syntax is a completely separate problem to solve. You can address the multi-dim regex problem by thinking in terms of the fundamental components of a regular expression and how it functions with those components. So, for example, you have stuff like these, which could be generalized to no longer involve characters:

  • Atoms (take up positions in the sequence, make up the lowest-level of rudimentary comparison -- recursion stops here)
  • Groups (both capturing and non-capturing, the point is to allow all the regex-matching features to be divided up into a heirarchy of pieces)
  • Quantifiers (applied to atoms or groups -- and these are usually as simple as a range from minimum-required to maximum-needed)
  • Assertions (typically are considered as tied to positions between atoms, or on the ends of a sequence, and the assertions do not advance position during processing except for their own temporary, internal matching evaluation)
  • Etc.

So you would still have all of that in generalized regular expressions, and in multi-dimensional ones as well.

The dimensionality aspect really does have to do with the directionality of advancement of the matching position within the target sequence. I had thought of this independently before coming across this very old posting, and I think it really is a key insight. For any particular search space (regardless of the type of the atoms), you could build a regular expression that includes a new type of component, which is a direction change. It serves as both an assertion (that it is possible at that position to move in the indicated direction), and as a control for the matching engine to advance position differently.

Direction changes need not limited to right-angles in an array (of any dimensionality). Instead, a direction change could apply to any possible directed relationship among atoms. Consider a tree structure, for example. What direction is the default direction? Perhaps there isn't one. Maybe a "direction" in this case is really an entire set of possible relationships, each of which might need to be tested, such as moving to a child node. Some of that sort of multi-dimensional relationship path has already been addressed in the world of XPath expressions, which fundamentally deal with a tree structure.

I should be able to have a search space that defines the fundamental operations: comparing two atoms, evaluating an assertion at a given position, enumerating all possible directions of a given type at a given position, and advancing in some specific direction at a given position (remember that the possibility should be tested first as an assertion, but with multiple possibilities, one need only have a non-empty set). All of this being defined, the regular expression could be built piece by piece using the same structure we use today for character strings, using function calls on a builder object if no syntax is invented to make it simpler. The point of all this is that, once you've built all that, the regular expression matching engine should still work pretty much the exact same way that string-based regex matching does. It's all the same thing.

This is what I think "multi-dimensional regular expressions" are. Now, if anybody is still out there seeing this ancient thread, does it exist anywhere? :-)

In reply to Re^3: Multidimensional regular expressions by keith.w.blackwell
in thread Multidimensional regular expressions by diotalevi

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others exploiting the Monastery: (6)
    As of 2021-03-04 16:44 GMT
    Find Nodes?
      Voting Booth?
      My favorite kind of desktop background is:

      Results (105 votes). Check out past polls.