comment on

Hello Monks

At work we have a giant cluster... of a sudoers file. I was able to reuse code from my last monks post Iterator to parse multiline string with \\n terminator, since these lines can be continued with a "\\n" in sudoers as well my last project (which is really cool!)

Now that I'm able to grab each line, I'm trying to figure out how to define my tokens. I really like MJD's approach to Lexing and have been referencing HOP::Lexer::Article (as well as Higher Order Perl) and the general wisdom is we should break down each line into the applicable tokens.

Ok, so I think I understand the code as presented (Though, I have no idea why HOP::Lexer "uses"(imports?) HOP::Stream but doesn't actually use the module... but that's really irrelevant to my use of the module) and I think get the gist of why we want our tokens in "TYPE", "TOKEN" format.

What I'm really not groking is the how of defining/identifying tokens.

For my sudoers file, my lines can be one of three types, comment, alias definition, or rule definition. Comments _should_ be easy since the line is just prefixed with a "#" (though I just thought of an edge case where rules have been commented out and might potentially end with a \\n... I may want to parse comments as rules. "should" is a funny word...), so I'm currently trying to tackle parsing alias definitions.

There are four types of aliases, "Host_Alias", "User_Alias", "Runas_Alias", and "Command_Alias". Alias definitions use the format:

ALIASTYPE ALIASNAME = PARAMETER, PARAMETER, \
                      PARAMETER, PARAMETER
[download]

For example:

Host_Alias HA_FOO_GROUP = abc123, eigh456, \
                          foo987, bar654

Runas_Alias RA_FOO_SVCACCT = www-data, ceph, \
                             sshd, memcache, \
                             xab123
[download]

(users don't really need to run as sshd, this is for example purposes, but the accounts that a user would sudo as could be any service or user account)

HOP::Lexer::make_lexer takes an iterator then a list of array refs in the form $label, $pattern, $transform_sub . The keywords are easy since we can just match against text, e.g.: (My::Sudoers::Iterator returns an iterator that grabs a line that is continued with a \\n)

my $lexer = make_lexer(
    My::Sudoers::Iterator->new('/etc/sudoers'),
    [ 'ALIASTYPE' => qr/(?:Host_Alias|User_Alias|Runas_Alias|Command_A
+lias)/, ],
    [ 'COMMA'     => qr/,/, ],
    [ 'DIVIDER'   => qr/=/, ],
    [ 'LINECONTINUER' => qr/\\\n/, ],
);
[download]

(open to better names than "LINECONTINUER"...)

Where I'm stuck is how do I define my token for "ALIASNAME" and the "PARAMETER"?

Since the rule name can be any alphanumeric string including _ a simple "\w+" won't suffice. I was thinking something along the lines of:

    [ 'ALIASNAME' => qr/ALIASTYPE \s+ (.*+) \s+ =/msx, ]
[download]

The big problem here is that HOP::Lexer uses capturing parenthesis to extract the token, so the above code will break the module. Additionally, "(.*+)" is typically a bad idea, but I couldn't figure out how to define that better. also, I don't think HOP::Lexer will be able to "see" tokens in the line that have been previously consumed.

The way I'm currently dealing with aliases is splitting on the equals, then split the left half on the the spaces to get the alias type and name, and split the right half on the commas. I don't think this approach isn't necessarily appropriate as it requires further logic to make sense of the mess (as opposed to just lexing the string to obtain tokens... obviously I will need to make use of the tokens at a later point in my application, but I think trying to do too much at once is causing me headaches when debugging edge cases. Parsing tokens will more easily allow me to determine what each piece of the statement means)...

Thanks all for your help.

In reply to Lexing: how to define tokens based on "context" by three18ti

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Don't ask to ask, just ask
	PerlMonks