Lexing: how to define tokens based on "context"by three18ti (Scribe)
|on Oct 16, 2013 at 13:01 UTC||Need Help??|
three18ti has asked for the
wisdom of the Perl Monks concerning the following question:
At work we have a giant cluster... of a sudoers file. I was able to reuse code from my last monks post Iterator to parse multiline string with \\n terminator, since these lines can be continued with a "\\n" in sudoers as well my last project (which is really cool!)
Now that I'm able to grab each line, I'm trying to figure out how to define my tokens. I really like MJD's approach to Lexing and have been referencing HOP::Lexer::Article (as well as Higher Order Perl) and the general wisdom is we should break down each line into the applicable tokens.
Ok, so I think I understand the code as presented (Though, I have no idea why HOP::Lexer "uses"(imports?) HOP::Stream but doesn't actually use the module... but that's really irrelevant to my use of the module) and I think get the gist of why we want our tokens in "TYPE", "TOKEN" format.
What I'm really not groking is the how of defining/identifying tokens.
For my sudoers file, my lines can be one of three types, comment, alias definition, or rule definition. Comments _should_ be easy since the line is just prefixed with a "#" (though I just thought of an edge case where rules have been commented out and might potentially end with a \\n... I may want to parse comments as rules. "should" is a funny word...), so I'm currently trying to tackle parsing alias definitions.
There are four types of aliases, "Host_Alias", "User_Alias", "Runas_Alias", and "Command_Alias". Alias definitions use the format:
(users don't really need to run as sshd, this is for example purposes, but the accounts that a user would sudo as could be any service or user account)
HOP::Lexer::make_lexer takes an iterator then a list of array refs in the form $label, $pattern, $transform_sub . The keywords are easy since we can just match against text, e.g.: (My::Sudoers::Iterator returns an iterator that grabs a line that is continued with a \\n)
(open to better names than "LINECONTINUER"...)
Where I'm stuck is how do I define my token for "ALIASNAME" and the "PARAMETER"?
Since the rule name can be any alphanumeric string including _ a simple "\w+" won't suffice. I was thinking something along the lines of:
The big problem here is that HOP::Lexer uses capturing parenthesis to extract the token, so the above code will break the module. Additionally, "(.*+)" is typically a bad idea, but I couldn't figure out how to define that better. also, I don't think HOP::Lexer will be able to "see" tokens in the line that have been previously consumed.
The way I'm currently dealing with aliases is splitting on the equals, then split the left half on the the spaces to get the alias type and name, and split the right half on the commas. I don't think this approach isn't necessarily appropriate as it requires further logic to make sense of the mess (as opposed to just lexing the string to obtain tokens... obviously I will need to make use of the tokens at a later point in my application, but I think trying to do too much at once is causing me headaches when debugging edge cases. Parsing tokens will more easily allow me to determine what each piece of the statement means)...
Thanks all for your help.