Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re: Regex help/ Lua parse

by kcott (Chancellor)
on Oct 26, 2012 at 11:30 UTC ( #1001049=note: print w/replies, xml ) Need Help??

in reply to Regex help/ Lua parse

G'day marquezc329,

In your config you have lines like key = value and you match the equals ( = ) part with \s=\s. Would key=value, key  =  value, key= value and similar variations be syntactically correct. If so, you may want to consider using \s*=\s* or \s+=\s+ as appropriate.

In a number of places you have code like $var =~ m/.../. While there's nothing actually wrong with this, you don't need the m when using forward slashes as the regexp delimiters, i.e. $var =~ /.../ is fine. Also, when matching $_ you can leave out the variable and the operator: $_ =~ m/.../ can simply be written as /.../. You don't have to use this format but you will see it in other people's code so you should at least be aware of it. All of this is explained in greater detail in Regexp Quote-Like Operators.

I see grizzley picked up on combining a match and substitution. You don't need to capture what you're replacing, so that could be further simplified to:

$line =~ s/^(\s*terminal\s=\s")[^"]+("\s*)$/$1$change$2/;

In getTags() you have m/^\s*names\s=\s{(\s*(.+),?)+\s*/. Here you have nested captures - this is fine in itself; however, as you don't subsequently use either of the captured data both captures are redundant.

Right after this, you have an exceptionally complicated method for stripping double-quotes (and other characters):

if ($line =~ m/^\s*names\s=\s{(\s*(.+),?)+\s*/) { @tags = split ",", $line; @tags = map { if ($_ =~ m/"([^"]+)"/) {$_ = $1} elsif ($_ =~ m/([^\s]+)/) {$_ = $1} } @tags; }

It took me some time to work out what was happening here. I suspect this was where you should have been matching against the previously captured data. I reduced that down to:

$line =~ /^\s*names\s*=\s*{\s*([^}]+)/; (my $names = $1) =~ y/" //d; my @tags = split /,/ => $names;

[y/// and tr/// are synonymous - see Quote-Like Operators for more details.]

That covers all the regexp parts of your code. You might also be interested in YAPE::Regex::Explain (do read the LIMITATIONS section) and Regexp::Debugger.

A consistent coding style, particularly with reference to indentations, would improve the readability of your code: perlstyle offers some advice in this area.

You also asked about "offerings of alternate solutions". I noticed that you're reading the entire config file twice: you indicated that this wasn't the entirety of the code, so maybe you're reading it more than this. You show a terminal setting but also make reference to an editor setting which you don't attempt to change. I've addressed these issues in addition to the regexp points I've already discussed. Here's the code:

#!/usr/bin/env perl use strict; use warnings; # Simulate Tie::File array my @config_file = map { chomp; $_ } <DATA>; print_config('Initial config'); for (@config_file) { $_ = /^\s*terminal\b/ ? change_value(terminal => $_) : /^\s*editor\b/ ? change_value(editor => $_) : /^\s*names\b/ ? get_tags($_) : $_; } print_config('Final config'); sub change_value { my ($key, $line) = @_; print "Current: $line\n"; print 'Change (Enter to keep): '; chomp(my $new_value = <>); if ($new_value) { $line =~ s/^(\s*$key\s*=\s*")[^"]+("\s*)$/$1$new_value$2/; } return $line; } sub get_tags { my $line = shift; $line =~ /^\s*names\s*=\s*{\s*([^}]+)/; (my $names = $1) =~ y/" //d; my @tags = split /,/ => $names; print "TAGS: @tags\n"; return $line; } sub print_config { my $heading = shift; print '=' x 64, "\n"; print "$heading\n"; print '-' x 64, "\n"; print "$_\n" for @config_file; print '=' x 64, "\n"; return; } __DATA__ -- This is used later as the default terminal and editor to run. terminal = "urxvt" editor = "vim" -- Define a tag table which hold all screen tags. tags = { names = { "Main", "WWW", "GIMP", "EMail", 6, 7 },

Here's a couple of sample runs - config values are changed in the second one.

$ ================================================================ Initial config ---------------------------------------------------------------- -- This is used later as the default terminal and editor to run. terminal = "urxvt" editor = "vim" -- Define a tag table which hold all screen tags. tags = { names = { "Main", "WWW", "GIMP", "EMail", 6, 7 }, ================================================================ Current: terminal = "urxvt" Change (Enter to keep): Current: editor = "vim" Change (Enter to keep): TAGS: Main WWW GIMP EMail 6 7 ================================================================ Final config ---------------------------------------------------------------- -- This is used later as the default terminal and editor to run. terminal = "urxvt" editor = "vim" -- Define a tag table which hold all screen tags. tags = { names = { "Main", "WWW", "GIMP", "EMail", 6, 7 }, ================================================================ $ ================================================================ Initial config ---------------------------------------------------------------- -- This is used later as the default terminal and editor to run. terminal = "urxvt" editor = "vim" -- Define a tag table which hold all screen tags. tags = { names = { "Main", "WWW", "GIMP", "EMail", 6, 7 }, ================================================================ Current: terminal = "urxvt" Change (Enter to keep): xterm Current: editor = "vim" Change (Enter to keep): emacs TAGS: Main WWW GIMP EMail 6 7 ================================================================ Final config ---------------------------------------------------------------- -- This is used later as the default terminal and editor to run. terminal = "xterm" editor = "emacs" -- Define a tag table which hold all screen tags. tags = { names = { "Main", "WWW", "GIMP", "EMail", 6, 7 }, ================================================================

-- Ken

Replies are listed 'Best First'.
Re^2: Regex help/ Lua parse
by marquezc329 (Scribe) on Oct 27, 2012 at 04:28 UTC

    I hope that it is not in bad taste to reply twice to your response. If so please forgive me.

    After reading your response I read all the linked material and reviewed the code that you provided. Once I was sure I understood your code, I modified my own. Then, taking your comments into consideration, I wrote a new (similar) piece to parse Awesome's theme configuration file. I tried to avoid capturing any data that wouldn't be subsequently referred to. I made sure to read the file only once, and instead assigned the info to related hashes of the form:

     variable_name => value

    I provided my code below, first to let you know that the time spent on your response was not in vain, and also in hopes that you would let me know of any complicated/unreadable methods, and/or style/convention infractions before I move on to adding some more interesting functionality. This is some of the most interesting and difficult code I've written thus far in my learning and I plan on expanding it, but I want to keep it clean.

    I've been nose deep in Friedl's Mastering Regular Expressions, and it's proven to be extremely useful. I've found regexps to be easier and more powerful, for my purposes, than the Modules that I referred to above. As a result, the regexps in this code are rather long, but I hope still readable.

    I've provided my code followed by a larger chunk of the awesome configuration file below:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %theme_table; my %taglist_table; my %menu_table; sub buildTables { while (<>) { if (/^\s*(theme\.(?:bg|fg|border|font)(?:\_(?:normal|focus|urg +ent|minimize|marked|width))?)\s*=\s*\"([^"]*)\"\s*$/) { $theme_table{$1} = $2 } if (/^\s*(theme\.taglist\_(?:bg|fg)\_(?:normal|focus|urgent))\ +s*=\s*\"([^"]*)\"\s*$/) { $taglist_table{$1} = $2 } if (/^\s*(theme\.menu\_(?:bg|fg|border|height|width)(?:\_(?:no +rmal|focus|color|width))?)\s*=\s*\"([^"]*)\"\s*$/) { $menu_table{$1} = $2 } } #print Dumper \%theme_table; #print Dumper \%taglist_table; #print Dumper \%menu_table; } sub printSettings { print "=" x 32, "\n"; print " AWESOME CONFIGURATION SETTINGS \n"; print "=" x 32, "\n\n"; foreach my $table (@_) { foreach my $key (sort keys %$table) { print " $key => $tabl +e->{$key}\n" } print "\n"; } } buildTables; printSettings(\%theme_table, \%taglist_table, \%menu_table);

    --------------------------- -- Default awesome theme -- --------------------------- theme = {} theme.font = "sans-serif 8: bold" --default bg_normal = #222222 theme.bg_normal = "#000000" --default bg_focus = #535d6c theme.bg_focus = "#000000" theme.bg_urgent = "#ffffff" theme.bg_minimize = "#ff0000" --default fg_normal = #aaaaaa theme.fg_normal = "#ffffff" --default fg_focus = #ffffff theme.fg_focus = "#55B043" theme.fg_urgent = "#ffffff" theme.fg_minimize = "#ffffff" theme.border_width = "2" theme.border_normal = "#000000" --default border_focus = #535d6c theme.border_focus = "#55B043" theme.border_marked = "#91231c" -- There are other variable sets -- overriding the default one when -- defined, the sets are: -- [taglist|tasklist]_[bg|fg]_[focus|urgent] -- titlebar_[bg|fg]_[normal|focus] -- tooltip_[font|opacity|fg_color|bg_color|border_width|border_color] -- mouse_finder_[color|timeout|animate_timeout|radius|factor] -- Example: --theme.taglist_bg_focus = "#ff0000" theme.taglist_bg_normal = "#000000" theme.taglist_bg_focus = "#000000" theme.taglist_fg_normal = "#000000" theme.taglist_fg_focus = "#55B043" -- Display the taglist squares theme.taglist_squares_sel = "/usr/share/awesome/themes/default/tagli +st/squarefw.png" theme.taglist_squares_unsel = "/usr/share/awesome/themes/default/tagli +st/squarew.png" theme.tasklist_floating_icon = "/usr/share/awesome/themes/default/task +list/floatingw.png" -- Variables set for theming the menu: -- menu_[bg|fg]_[normal|focus] -- menu_[border_color|border_width] theme.menu_bg_normal = "#000000" theme.menu_bg_focus = "#55B043" theme.menu_fg_normal = "#55B043" theme.menu_fg_focus = "#000000" theme.menu_border_color = "#000000" theme.menu_border_width = "5" -- theme.menu_submenu_icon = "/usr/share/awesome/themes/default/submen +u.png" theme.menu_height = "15" -- default menu_width = 100 theme.menu_width = "150"

    Thank you, and again, I apologize if this response is in bad taste.

      Firstly, you've done nothing wrong in replying twice to the same node. You've done this in a perfectly acceptable fashion: asking different questions which stem from the same previous response. Here's an example of me doing the exactly same thing in last 24 hours: Perl/TK borderwidth question - note the two Re^3: Perl/TK borderwidth question responses. Updating your node (and perhaps sending a /msg indicating such updating) is the more usual way; however, in this instance, you've done the right thing: thunderbolts from the gods may prove me wrong. *gulp* :-)

      My personal preference for links to books is that they target the publisher not some arbitrary vendor. A vendor will not advertise books that they do not have in stock. Mastering Regular Expressions is published by O'Reilley and this company, in particular Tim O'Reilley, has been a particularly good friend to Perl over the years - the company your posted link refers to shows no such affiliation. The link I would have provided for this book is: Mastering Regular Expressions (i.e. actual markup: [|Mastering Regular Expressions] [I do have a copy of that book myself and - yes - it's wonderful! :-)]

      OK - Off the soap box and back to the code.

      Wherever you have non-capturing parentheses containing alternation (e.g (?:x|y|z)), it's usually better to avoid backtracking with the (?>...) construct (e.g. (?>x|y|z)). See perlre - Backtracking, perlre - Extended Patterns and pp. 102-107 in Friedl's book (check the index - page numbers may differ in your version).

      You say: "As a result, the regexps in this code are rather long, but I hope still readable.". There's absolutely no reason for them to be unreadable due to length: just use the x option. Furthermore, for those variables not dependent on a loop variable, you can compile them outside of the loop once. Here's an example:

      sub some_function { my $re = qr{ \A # start of string (?> x | y | z ) # match exactly one of x, y or z \z # end of string (ignore optional terminal new +line) }x; while (<>) { if (/$re/} { # do something based on successful match } } }

      Finally, I am absolutely not going to tell you to adopt any particular coding style; however, I am going to urge you to adopt a coding style that's easy for you and others to read. Have a look around the Monastery, see how other Monks write their code, then pick something you're comfortable with. Beyong the indentation issues, the code you presented at the start of this thread was superior to what you now have. If you seriously don't understand what you read in perlstyle, then please ask for clarification.

      -- Ken

        Thank you for your response. Updated the reference to Mastering Regular Expressions above. Can you please explain how the previous code was superior? I see the indentation problems that you are referring to, and I tried to be sure to follow the guidelines specified in perlstyle, i.e. 4 space indentations, curly brackets on same line or lined up vertically, space around most operators. Should I have used

         $theme_table{$1} = $2 if (/regex/); instead of  if (/regex/) { $theme_table{$1}


        foreach my $table (@_) { foreach my $key (sort keys %$table) { print "$key => $table->{$key}\n" } print "\n"; }
        instead of
        foreach my $table (@_) { foreach my $key (sort keys %$table) { print " $key => $tabl +e->{$key}\n" } print "\n"; }

        If you could clarify exactly where it is that I'm missing the mark stylistically it would be greatly appreciated. I think good style is the most difficult thing to learn out of a book. Thanks again for your help.

Re^2: Regex help/ Lua parse
by marquezc329 (Scribe) on Oct 26, 2012 at 21:24 UTC

    Thank you for your detailed response.

    Read and bookmarked perlstyle.

    $line =~ /^\s*names\s*=\s*{\s*([^}]+)/; (my $names = $1) =~ y/" //d; my @tags = split /,/ => $names;

    ^ I figured that my original method could be reduced, but I couldn't seem to figure out how to go about it. This is definitely a much more elegant solution. Thank you for your reference to Quote-Like-Operators. I do have a question though about:

    (my $names = $1) =~ y/" //d;
    I have seen this construct before on PerlMonks. I did a search and found Use of parentheses around a variable, but wasn't sure how it applied to regexps. Is there a specific name for this method of assignment that I can use to better my search results?

    Concepts are easily read, but I feel like I learn most by having my code reviewed by the talented and experienced monks. Thank you again for taking the time out to provide guidance in my learning.

      I don't think that construct has a specific name - it's just using parentheses to change precedence. Usage examples for s/// can be found in Regexp Quote-Like Operators; examples for y/// (although its synonym tr/// is used in these examples) can be found in Quote-Like Operators.

      In Perl 5.14.0, an r option was introduced (see perl5140delta under Core Enhancements - Regular Expressions - Non-destructive substitution). This makes the following equivalences:

      # For y/// (my $x = $y) =~ y///; my $x = $y =~ y///r; # Ditto for its synonym tr/// (although not mentioned in perl5140delta +) (my $x = $y) =~ tr///; my $x = $y =~ tr///r; # And for s/// (my $x = $y) =~ s///; my $x = $y =~ s///r;

      The first two links above are for the current Perl version (5.16.0 at the time of writing) so they have examples of this also.

      -- Ken

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1001049]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2018-05-26 08:31 GMT
Find Nodes?
    Voting Booth?