Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Regex help/ Lua parse

by marquezc329 (Scribe)
on Oct 26, 2012 at 06:22 UTC ( #1001000=perlquestion: print w/replies, xml ) Need Help??
marquezc329 has asked for the wisdom of the Perl Monks concerning the following question:

Hi everybody. I'm taking the time to configure an Awesome WM after completing my first (and lengthy) Gentoo install. The Awesome config files are all in Lua and I figured automating the process a bit would be a nice opportunity to practice Perl by parsing and modifying the config code using regex's. The code I've got so far works, but I'm hoping that the monks will look my regex's over and provide more graceful/elegant/stable variations of the same purpose for me to compare/contrast and learn from.

I used the following code to find the terminal, and taglist specified in the configuration code. These lines appear in the configuration file as:

-- This is used later as the default terminal and editor to run. terminal = "urxvt"

-- Define a tag table which hold all screen tags. tags = { names = { "Main", "WWW", "GIMP", "EMail", 6, 7 },

The applicable parts of my solution:

#!/usr/bin/perl # The configuration file is only about 400 lines, # so I Tie::file to @config_file my @config_file; sub procTerm { my $change = $_[0]; my $conf_file = $_[1]; foreach my $line (@$conf_file) { if ($line =~ m/^\s*terminal\s=\s"([^"]+)"\s*$/) { $line =~ s/$1/$change/; } } } sub changeTerm { my $conf_file = shift; print "Change Terminal: "; chomp (my $newTerm = <STDIN>); procTerm($newTerm, $conf_file); } sub getTags { my $conf_file = shift; my @tags; foreach my $line (@$conf_file) { if ($line =~ m/^\s*names\s=\s{(\s*(.+),?)+\s*/) { @tags = split ",", $line; @tags = map { if ($_ =~ m/"([^"]+)"/) {$_ = $1} elsif ($_ =~ m/([^\s]+)/) {$_ = $1} } @tags; } } print Dumper \@tags; } changeTerm(\@config_file); getTags(\@config_file);

I don't feel like my regex's are as efficient as they could be, but these are the best that I was able to work up. I understand that this is a bit like the frequently discouraged reinvention of the wheel. A CPAN search provided the following: I plan on using these Modules to make my own frontend for manipulating Awesome, but I have really been looking for interesting opportunities to practice the use of regex's. I appreciate any input on my coding, and offerings of alternate solutions. Thanks guys.
**To any rookie programmer/self-starters like myself: If you find yourself looking for projects to get practice, and stay motivated I would definitely suggest one of the more difficult *nix installs. The installation/configuration process definitely opened several paths of inspiration that feel more useful, and are more fun than book exercises. Not to mention I learned a TON about *nix **

Replies are listed 'Best First'.
Re: Regex help/ Lua parse
by kcott (Chancellor) on Oct 26, 2012 at 11:30 UTC

    G'day marquezc329,

    In your config you have lines like key = value and you match the equals ( = ) part with \s=\s. Would key=value, key  =  value, key= value and similar variations be syntactically correct. If so, you may want to consider using \s*=\s* or \s+=\s+ as appropriate.

    In a number of places you have code like $var =~ m/.../. While there's nothing actually wrong with this, you don't need the m when using forward slashes as the regexp delimiters, i.e. $var =~ /.../ is fine. Also, when matching $_ you can leave out the variable and the operator: $_ =~ m/.../ can simply be written as /.../. You don't have to use this format but you will see it in other people's code so you should at least be aware of it. All of this is explained in greater detail in Regexp Quote-Like Operators.

    I see grizzley picked up on combining a match and substitution. You don't need to capture what you're replacing, so that could be further simplified to:

    $line =~ s/^(\s*terminal\s=\s")[^"]+("\s*)$/$1$change$2/;

    In getTags() you have m/^\s*names\s=\s{(\s*(.+),?)+\s*/. Here you have nested captures - this is fine in itself; however, as you don't subsequently use either of the captured data both captures are redundant.

    Right after this, you have an exceptionally complicated method for stripping double-quotes (and other characters):

    if ($line =~ m/^\s*names\s=\s{(\s*(.+),?)+\s*/) { @tags = split ",", $line; @tags = map { if ($_ =~ m/"([^"]+)"/) {$_ = $1} elsif ($_ =~ m/([^\s]+)/) {$_ = $1} } @tags; }

    It took me some time to work out what was happening here. I suspect this was where you should have been matching against the previously captured data. I reduced that down to:

    $line =~ /^\s*names\s*=\s*{\s*([^}]+)/; (my $names = $1) =~ y/" //d; my @tags = split /,/ => $names;

    [y/// and tr/// are synonymous - see Quote-Like Operators for more details.]

    That covers all the regexp parts of your code. You might also be interested in YAPE::Regex::Explain (do read the LIMITATIONS section) and Regexp::Debugger.

    A consistent coding style, particularly with reference to indentations, would improve the readability of your code: perlstyle offers some advice in this area.

    You also asked about "offerings of alternate solutions". I noticed that you're reading the entire config file twice: you indicated that this wasn't the entirety of the code, so maybe you're reading it more than this. You show a terminal setting but also make reference to an editor setting which you don't attempt to change. I've addressed these issues in addition to the regexp points I've already discussed. Here's the code:

    #!/usr/bin/env perl use strict; use warnings; # Simulate Tie::File array my @config_file = map { chomp; $_ } <DATA>; print_config('Initial config'); for (@config_file) { $_ = /^\s*terminal\b/ ? change_value(terminal => $_) : /^\s*editor\b/ ? change_value(editor => $_) : /^\s*names\b/ ? get_tags($_) : $_; } print_config('Final config'); sub change_value { my ($key, $line) = @_; print "Current: $line\n"; print 'Change (Enter to keep): '; chomp(my $new_value = <>); if ($new_value) { $line =~ s/^(\s*$key\s*=\s*")[^"]+("\s*)$/$1$new_value$2/; } return $line; } sub get_tags { my $line = shift; $line =~ /^\s*names\s*=\s*{\s*([^}]+)/; (my $names = $1) =~ y/" //d; my @tags = split /,/ => $names; print "TAGS: @tags\n"; return $line; } sub print_config { my $heading = shift; print '=' x 64, "\n"; print "$heading\n"; print '-' x 64, "\n"; print "$_\n" for @config_file; print '=' x 64, "\n"; return; } __DATA__ -- This is used later as the default terminal and editor to run. terminal = "urxvt" editor = "vim" -- Define a tag table which hold all screen tags. tags = { names = { "Main", "WWW", "GIMP", "EMail", 6, 7 },

    Here's a couple of sample runs - config values are changed in the second one.

    $ ================================================================ Initial config ---------------------------------------------------------------- -- This is used later as the default terminal and editor to run. terminal = "urxvt" editor = "vim" -- Define a tag table which hold all screen tags. tags = { names = { "Main", "WWW", "GIMP", "EMail", 6, 7 }, ================================================================ Current: terminal = "urxvt" Change (Enter to keep): Current: editor = "vim" Change (Enter to keep): TAGS: Main WWW GIMP EMail 6 7 ================================================================ Final config ---------------------------------------------------------------- -- This is used later as the default terminal and editor to run. terminal = "urxvt" editor = "vim" -- Define a tag table which hold all screen tags. tags = { names = { "Main", "WWW", "GIMP", "EMail", 6, 7 }, ================================================================ $ ================================================================ Initial config ---------------------------------------------------------------- -- This is used later as the default terminal and editor to run. terminal = "urxvt" editor = "vim" -- Define a tag table which hold all screen tags. tags = { names = { "Main", "WWW", "GIMP", "EMail", 6, 7 }, ================================================================ Current: terminal = "urxvt" Change (Enter to keep): xterm Current: editor = "vim" Change (Enter to keep): emacs TAGS: Main WWW GIMP EMail 6 7 ================================================================ Final config ---------------------------------------------------------------- -- This is used later as the default terminal and editor to run. terminal = "xterm" editor = "emacs" -- Define a tag table which hold all screen tags. tags = { names = { "Main", "WWW", "GIMP", "EMail", 6, 7 }, ================================================================

    -- Ken

      I hope that it is not in bad taste to reply twice to your response. If so please forgive me.

      After reading your response I read all the linked material and reviewed the code that you provided. Once I was sure I understood your code, I modified my own. Then, taking your comments into consideration, I wrote a new (similar) piece to parse Awesome's theme configuration file. I tried to avoid capturing any data that wouldn't be subsequently referred to. I made sure to read the file only once, and instead assigned the info to related hashes of the form:

       variable_name => value

      I provided my code below, first to let you know that the time spent on your response was not in vain, and also in hopes that you would let me know of any complicated/unreadable methods, and/or style/convention infractions before I move on to adding some more interesting functionality. This is some of the most interesting and difficult code I've written thus far in my learning and I plan on expanding it, but I want to keep it clean.

      I've been nose deep in Friedl's Mastering Regular Expressions, and it's proven to be extremely useful. I've found regexps to be easier and more powerful, for my purposes, than the Modules that I referred to above. As a result, the regexps in this code are rather long, but I hope still readable.

      I've provided my code followed by a larger chunk of the awesome configuration file below:

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %theme_table; my %taglist_table; my %menu_table; sub buildTables { while (<>) { if (/^\s*(theme\.(?:bg|fg|border|font)(?:\_(?:normal|focus|urg +ent|minimize|marked|width))?)\s*=\s*\"([^"]*)\"\s*$/) { $theme_table{$1} = $2 } if (/^\s*(theme\.taglist\_(?:bg|fg)\_(?:normal|focus|urgent))\ +s*=\s*\"([^"]*)\"\s*$/) { $taglist_table{$1} = $2 } if (/^\s*(theme\.menu\_(?:bg|fg|border|height|width)(?:\_(?:no +rmal|focus|color|width))?)\s*=\s*\"([^"]*)\"\s*$/) { $menu_table{$1} = $2 } } #print Dumper \%theme_table; #print Dumper \%taglist_table; #print Dumper \%menu_table; } sub printSettings { print "=" x 32, "\n"; print " AWESOME CONFIGURATION SETTINGS \n"; print "=" x 32, "\n\n"; foreach my $table (@_) { foreach my $key (sort keys %$table) { print " $key => $tabl +e->{$key}\n" } print "\n"; } } buildTables; printSettings(\%theme_table, \%taglist_table, \%menu_table);

      --------------------------- -- Default awesome theme -- --------------------------- theme = {} theme.font = "sans-serif 8: bold" --default bg_normal = #222222 theme.bg_normal = "#000000" --default bg_focus = #535d6c theme.bg_focus = "#000000" theme.bg_urgent = "#ffffff" theme.bg_minimize = "#ff0000" --default fg_normal = #aaaaaa theme.fg_normal = "#ffffff" --default fg_focus = #ffffff theme.fg_focus = "#55B043" theme.fg_urgent = "#ffffff" theme.fg_minimize = "#ffffff" theme.border_width = "2" theme.border_normal = "#000000" --default border_focus = #535d6c theme.border_focus = "#55B043" theme.border_marked = "#91231c" -- There are other variable sets -- overriding the default one when -- defined, the sets are: -- [taglist|tasklist]_[bg|fg]_[focus|urgent] -- titlebar_[bg|fg]_[normal|focus] -- tooltip_[font|opacity|fg_color|bg_color|border_width|border_color] -- mouse_finder_[color|timeout|animate_timeout|radius|factor] -- Example: --theme.taglist_bg_focus = "#ff0000" theme.taglist_bg_normal = "#000000" theme.taglist_bg_focus = "#000000" theme.taglist_fg_normal = "#000000" theme.taglist_fg_focus = "#55B043" -- Display the taglist squares theme.taglist_squares_sel = "/usr/share/awesome/themes/default/tagli +st/squarefw.png" theme.taglist_squares_unsel = "/usr/share/awesome/themes/default/tagli +st/squarew.png" theme.tasklist_floating_icon = "/usr/share/awesome/themes/default/task +list/floatingw.png" -- Variables set for theming the menu: -- menu_[bg|fg]_[normal|focus] -- menu_[border_color|border_width] theme.menu_bg_normal = "#000000" theme.menu_bg_focus = "#55B043" theme.menu_fg_normal = "#55B043" theme.menu_fg_focus = "#000000" theme.menu_border_color = "#000000" theme.menu_border_width = "5" -- theme.menu_submenu_icon = "/usr/share/awesome/themes/default/submen +u.png" theme.menu_height = "15" -- default menu_width = 100 theme.menu_width = "150"

      Thank you, and again, I apologize if this response is in bad taste.

        Firstly, you've done nothing wrong in replying twice to the same node. You've done this in a perfectly acceptable fashion: asking different questions which stem from the same previous response. Here's an example of me doing the exactly same thing in last 24 hours: Perl/TK borderwidth question - note the two Re^3: Perl/TK borderwidth question responses. Updating your node (and perhaps sending a /msg indicating such updating) is the more usual way; however, in this instance, you've done the right thing: thunderbolts from the gods may prove me wrong. *gulp* :-)

        My personal preference for links to books is that they target the publisher not some arbitrary vendor. A vendor will not advertise books that they do not have in stock. Mastering Regular Expressions is published by O'Reilley and this company, in particular Tim O'Reilley, has been a particularly good friend to Perl over the years - the company your posted link refers to shows no such affiliation. The link I would have provided for this book is: Mastering Regular Expressions (i.e. actual markup: [|Mastering Regular Expressions] [I do have a copy of that book myself and - yes - it's wonderful! :-)]

        OK - Off the soap box and back to the code.

        Wherever you have non-capturing parentheses containing alternation (e.g (?:x|y|z)), it's usually better to avoid backtracking with the (?>...) construct (e.g. (?>x|y|z)). See perlre - Backtracking, perlre - Extended Patterns and pp. 102-107 in Friedl's book (check the index - page numbers may differ in your version).

        You say: "As a result, the regexps in this code are rather long, but I hope still readable.". There's absolutely no reason for them to be unreadable due to length: just use the x option. Furthermore, for those variables not dependent on a loop variable, you can compile them outside of the loop once. Here's an example:

        sub some_function { my $re = qr{ \A # start of string (?> x | y | z ) # match exactly one of x, y or z \z # end of string (ignore optional terminal new +line) }x; while (<>) { if (/$re/} { # do something based on successful match } } }

        Finally, I am absolutely not going to tell you to adopt any particular coding style; however, I am going to urge you to adopt a coding style that's easy for you and others to read. Have a look around the Monastery, see how other Monks write their code, then pick something you're comfortable with. Beyong the indentation issues, the code you presented at the start of this thread was superior to what you now have. If you seriously don't understand what you read in perlstyle, then please ask for clarification.

        -- Ken

      Thank you for your detailed response.

      Read and bookmarked perlstyle.

      $line =~ /^\s*names\s*=\s*{\s*([^}]+)/; (my $names = $1) =~ y/" //d; my @tags = split /,/ => $names;

      ^ I figured that my original method could be reduced, but I couldn't seem to figure out how to go about it. This is definitely a much more elegant solution. Thank you for your reference to Quote-Like-Operators. I do have a question though about:

      (my $names = $1) =~ y/" //d;
      I have seen this construct before on PerlMonks. I did a search and found Use of parentheses around a variable, but wasn't sure how it applied to regexps. Is there a specific name for this method of assignment that I can use to better my search results?

      Concepts are easily read, but I feel like I learn most by having my code reviewed by the talented and experienced monks. Thank you again for taking the time out to provide guidance in my learning.

        I don't think that construct has a specific name - it's just using parentheses to change precedence. Usage examples for s/// can be found in Regexp Quote-Like Operators; examples for y/// (although its synonym tr/// is used in these examples) can be found in Quote-Like Operators.

        In Perl 5.14.0, an r option was introduced (see perl5140delta under Core Enhancements - Regular Expressions - Non-destructive substitution). This makes the following equivalences:

        # For y/// (my $x = $y) =~ y///; my $x = $y =~ y///r; # Ditto for its synonym tr/// (although not mentioned in perl5140delta +) (my $x = $y) =~ tr///; my $x = $y =~ tr///r; # And for s/// (my $x = $y) =~ s///; my $x = $y =~ s///r;

        The first two links above are for the current Perl version (5.16.0 at the time of writing) so they have examples of this also.

        -- Ken

Re: Regex help/ Lua parse
by grizzley (Chaplain) on Oct 26, 2012 at 07:47 UTC
    You could save yourself one match by replacing
    if ($line =~ m/^\s*terminal\s=\s"([^"]+)"\s*$/) { $line =~ s/$1/$change/; }
    $line =~ s/^(\s*terminal\s=\s")([^"]+)("\s*)$/$1$change$3/;
      Thank you. Modified all occurrences like this.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1001000]
Approved by nemesdani
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2017-06-28 15:52 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (640 votes). Check out past polls.