Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: When the only tool you have is a screwdriver...

by sundialsvc4 (Abbot)
on Oct 08, 2015 at 15:49 UTC ( [id://1144192]=note: print w/replies, xml ) Need Help??


in reply to When the only tool you have is a screwdriver...

It seems to me that you are going to have to approach this problem with algorithmic logic that goes beyond the simple use of regular expressions.   You are actually parsing this string, and therefore I would consider using a parse-engine ... say Parse::RecDescent a Perl front-end wrapper for a “real, and advanced,” non-recursive parser.   (This problem does not permit a recursive-descent solution ... see below.)

From a grammar point-of-view, your input strings consist of “identifiers,” being fixed literal strings such as u, a, d, b, c, and productions, such as e.   Thus, a grammar for this .. abstractly speaking .. might go something like this:

e ::= 'u' | 'u' 'a' <e> | <e> 'd' <e> 'f' <e> | <e> 'b' <e> | <e> 'c'
... with the fact that the last three rules are “head recursive being a very-serious problem that would immediately exclude the use of many parsers.   (If a production begins with an instance of itself, the parser falls down a rabbit-hole and never comes out, if it attempts a “depth-first” approach as many parsers do.   Some parsers such as Bison provide convoluted means for parsing head-recursive grammars but they are difficult compromises.

It also seems to me that this definition, however expressed, will have a lot of potential for ambiguity:   there could be many valid parses of the same string.

Certainly, the use of a parser algorithm is called-for here, of some sort.   Quietly give-up on the notion of trying to do this with regex calisthenics.   And, also, give up on the notion that you will be able to efficiently solve this with “only a screwdriver.”

Replies are listed 'Best First'.
Re^2: When the only tool you have is a screwdriver...
by mr_ron (Chaplain) on Oct 08, 2015 at 16:22 UTC

      Yes, but your situation is not necessarily “balanced.”   In the left/right parenthesis example, once the right-paren is seen, the matter is resolved.   But in some of your examples, we see multiple occurrences of "e".   Thus, in the general case, more-than-one recursion might have to be entered, and then backed-out in what turns out to be a failed parse-branch.   I do not believe that you can expect the regex engine to be that smart, although other Monks may promptly jump-in and show that I am wrong.

      However ... the saving grace might be “your actual data.”   If, for example, the terminal-tokens always consist of single characters, some sort of one- or one-and-three- character lookahead might be sufficient to resolve ambiguity.   After all, you do not always have to write for the general case:   you merely have to write what gets the job done in your case.   (Yay!)

      Even so, I think that this requirement (IMHO) has stepped beyond what you can reasonably expect from regular-expressions alone:   it is, I think, “a parsing problem.”   (By that I mean, “some sort of heuristic will be required, on top of what regexes alone can give you.)   With good fortune, you will find that there is an existing implementation of a heuristic (a parser) that will do the job.   Without good fortune, you will hack.   ;-)

Re^2: When the only tool you have is a screwdriver...
by ExReg (Priest) on Oct 08, 2015 at 18:08 UTC
    You are right. It really is a parsing problem. I have been looking forlornly at things like RecDescent, wishing I had it on the system in question. I will probably flog this dead horse a bit longer before I have to tell them the that handle of the screwdriver doesn't pound to well and that I need a hammer also.

      No need for a hammer. See "sub e" in 1144217 . It's only three lines long :)

      The first line looks for what starts an "e".
      The second line looks for what can follow an "e" and still be an "e".
      The third line indicates success :)

      What could be simpler ? (hehehe)

        That indeed does work. What I was looking for was a single regular expression that could be tested against the text files. I would have to totally revamp the logic of the tool to use it. I finally did get the regular expression to do the trick and it is extremely long. Goes to show how a little logic can match a whole lot of regex.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1144192]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (3)
As of 2024-04-16 04:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found