Re: When the only tool you have is a screwdriver...

It seems to me that you are going to have to approach this problem with algorithmic logic that goes beyond the simple use of regular expressions. You are actually parsing this string, and therefore I would consider using a parse-engine ... ~~say Parse::RecDescent~~ a Perl front-end wrapper for a “real, and advanced,” non-recursive parser. (This problem does not permit a recursive-descent solution ... see below.)

From a grammar point-of-view, your input strings consist of “identifiers,” being fixed literal strings such as u, a, d, b, c, and productions, such as e. Thus, a grammar for this .. abstractly speaking .. might go something like this:

e ::= 'u'
      | 'u' 'a' <e> 
      | <e> 'd' <e> 'f' <e>
      | <e> 'b' <e>
      | <e> 'c'
[download]

... with the fact that the last three rules are “head recursive being a very-serious problem that would immediately exclude the use of many parsers. (If a production begins with an instance of itself, the parser falls down a rabbit-hole and never comes out, if it attempts a “depth-first” approach as many parsers do. Some parsers such as Bison provide convoluted means for parsing head-recursive grammars but they are difficult compromises.

It also seems to me that this definition, however expressed, will have a lot of potential for ambiguity: there could be many valid parses of the same string.

Certainly, the use of a parser algorithm is called-for here, of some sort. Quietly give-up on the notion of trying to do this with regex calisthenics. And, also, give up on the notion that you will be able to efficiently solve this with “only a screwdriver.”

Comment on Re: When the only tool you have is a screwdriver... Download Code

Replies are listed 'Best First'.
Re^2: When the only tool you have is a screwdriver... by mr_ron (Chaplain) on Oct 08, 2015 at 16:22 UTC
I would just like to explain a bit about giving up on regex calisthenics. The problem is trying to match a "recursive pattern" another example of which would be matching balanced parenthesis which is a Perl FAQ. `p =~ () p =~ (p) p =~ pp` [download] The faq and answer are here: Can I use Perl regular expressions to match balanced text? Ron	[reply] [d/l]
Re^3: When the only tool you have is a screwdriver... by sundialsvc4 (Abbot) on Oct 08, 2015 at 16:50 UTC
Yes, but your situation is not necessarily “balanced.” In the left/right parenthesis example, once the right-paren is seen, the matter is resolved. But in some of your examples, we see multiple occurrences of `"e"`. Thus, in the general case, more-than-one recursion might have to be entered, and then backed-out in what turns out to be a failed parse-branch. I do not believe that you can expect the regex engine to be that smart, although other Monks may promptly jump-in and show that I am wrong. However ... the saving grace might be “your actual data.” If, for example, the terminal-tokens always consist of single characters, some sort of one- or one-and-three- character lookahead might be sufficient to resolve ambiguity. After all, you do not always have to write for the general case: you merely have to write what gets the job done in your case. (Yay!) Even so, I think that this requirement (IMHO) has stepped beyond what you can reasonably expect from regular-expressions alone: it is, I think, “a parsing problem.” (By that I mean, “some sort of heuristic will be required, on top of what regexes alone can give you.) With good fortune, you will find that there is an existing implementation of a heuristic (a parser) that will do the job. Without good fortune, you will hack. `;-)`	[reply]
Re^2: When the only tool you have is a screwdriver... by ExReg (Priest) on Oct 08, 2015 at 18:08 UTC
You are right. It really is a parsing problem. I have been looking forlornly at things like RecDescent, wishing I had it on the system in question. I will probably flog this dead horse a bit longer before I have to tell them the that handle of the screwdriver doesn't pound to well and that I need a hammer also.	[reply]
Re^3: When the only tool you have is a screwdriver... by Anonymous Monk on Oct 08, 2015 at 19:45 UTC
No need for a hammer. See "sub e" in 1144217 . It's only three lines long :) The first line looks for what starts an "e". The second line looks for what can follow an "e" and still be an "e". The third line indicates success :) What could be simpler ? (hehehe)	[reply]
Re^4: When the only tool you have is a screwdriver... by ExReg (Priest) on Oct 09, 2015 at 14:27 UTC
That indeed does work. What I was looking for was a single regular expression that could be tested against the text files. I would have to totally revamp the logic of the tool to use it. I finally did get the regular expression to do the trick and it is extremely long. Goes to show how a little logic can match a whole lot of regex.	[reply]


Problems? Is your data what you think it is?
	PerlMonks