Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
more useful options
 
PerlMonks  

Re: VarStructor 1.0

by tkil (Monk)
on May 04, 2004 at 04:34 UTC ( [id://350259]=note: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.


in reply to VarStructor 1.0

Ok. Here is what I have so far; this is the result of about six hours work on your utility, trying to keep track of why I changed things the way I did.

I still don't think that this particular library is all that useful, and I agree with many of the other monks that it violates good coding guidelines left and right. It works for you, and I have no problems with that; the question is, would it work for other people?

The guidelines we put in place are not there to be memorized and followed by rote; there are more than a few debates about the finer points. But the overall goal is to make programming easier, less haphazard, more fun, and more reliable. Many languages and years of experience have shown that encapsulation, decoupling, data hiding, orthogonality, commenting, code-reuse, and not trying to outwit the language are all likely to make your life as a programmer more rewarding and productive. And I know that you do not consider yourself a professional or career programmer -- isn't it all the more important that you don't waste time and effort going down the same dead ends and roundabouts that others documented years ago?

Here's my list of points you might want to think about when you write code:

  • Wrap all your lines to be less than about 72 characters. Your comments violate this pretty frequently.
  • Instead of continuously doing verbose regex testing, do the test once, and set a flag that you then access later.
  • What if my output file is on drive "E:" on a win32 platform? Your parameter handling is perhaps too permissive; consider just dying if required paramers are not given (or are given in the wrong format)
  • Don't use all upper-case variable names (except for globals). It looks like you're shouting all the time.
  • Always include the operation and filename when outputting errors. (In particular, if an open fails, always print out what you were trying to open!)
  • /s modifer affects exactly one thing: whether or not a dot matches a newline. It has zero effect on anything else.
  • ^ (and, for that matter, $) are not special inside of character classes. Update: as pointed out by demerphq, ^ can most certainly be special in a character class: if it is the first character, it negate the rest of the class. Now I can't even find the code that at I was objecting to.
  • You can't rely on \n always being only and exactly the line terminator. If you want to split stuff up by lines, either read them in as lines, or do something like this:
    my $eol = qr/\015|\012|\015\012/;
    (And this is done on purpose; on at least some platforms, the shorthand \n is set to local EOL regardless. It is not always \012.)
  • Be consistent. Why do you treat the variable display as different from the hash display, w.r.t. "show" v. "clear"? Why do hashes need special treatment, but not arrays? (Note that this is probably explained in the code, but i'm doing onion slices of replacements here.
  • Hash key expressions certainly can have newlines in them:
    $hash{ $something + $long / $and * $hairy } = $foo;
  • As a matter of style, always capture values out of subgroups immediately after doing the match. Using submatch expressions in further regexes might work, but it is awfully confusing. So, instead of doing:
    if ( /blah (foo) gibber/ ) { $var =~ s/\Q$1\E/bar/g; }
    I would personally prefer:
    if ( /blah (foo) gibber/ ) { my $thingy = $1; $var =~ s/\Q$thingy\E/bar/g; }
  • Pay attention to greediness. When you try to delete here-docs with this regex:
    $file =~ s/<<(\w+);.*\n\1\n//sg;
    Consider what happens when you have code that looks like this:
    print <<EOT; this is short. EOT $var = 'quux'; print <<EOT; this is also short; EOT
    Since you don't limit the greediness, you'll miss that assignment to $var
  • Perl is hard to parse. Even perl doesn't get it right all the time. When you do try to parse it, remember that you have to deal with all sorts of quoting issues. To wit, consider this regex:
    $file =~ s/[;\n]\s*#[^\n]*//sg; # Delete comments
    against this code:
    print "foo; #bar";
    Splat!
  • Here-docs don't have to be on the end of the line. This is perfectly valid:
    print <<HTML, $end_of_stream; ... ... HTML
    Further, you can have multiple here-docs on one line:
    print <<P1, <<P2; ... ... P1 ... ... P2
  • Trivial: when you capture something using (\w+), you are guaranteed that you don't need to quote it. So don't use \Q...\E on it.
  • Don't forget that subroutines can have prototypes (see perlsub):
    sub twiddle ( $ ) { my ( $arg ) = @_; ... }
    Your regex for matching subroutines does not take these into account.
  • Don't use substitution (s///) where a simple match (m//) would do.
  • Don't use $& (or its friends, $' and $`). They slow down the entire program if they are used anywhere, even once. Use capturing parens where (and only where) needed, even if it's the entire expression.
  • I think I mentioned something like this before, but: always check for errors, and report them in an adequate way. My current preference is to just die, and catch it in an eval if necessary.
  • Loop control operations can have braces in them. This breaks your regex for block detection. Consider:
    LOOP: for ( my $I = do { ... }; $i < 20; ++$i ) { ... }
    Also note different indent for loop label, and the potential for having brackets on the same line. Forcing people to code in a particular style is evil. (And while we have our guidelines, they are all "we've found this to be a good idea, why don't you try it?" rather than "if you don't do it this way, my utility won't work.")
  • Most metachars lose their special actions inside of character classes. Also, character classes are "alternation" sort of by definition. So when you write /[\$|\@|\%]/, you're really asking for dollar signs, at signs, percent signs, and pipes... as a character class. Prefer: /[\$\@\%]/

And here is my rework of the code so far. (This is version 15 in my local version control; I've been trying to keep different changes isolated. I can try to publish the entire sequence one way or another, if you're curious what sort of changes I've been making as "chunks").

#!/usr/bin/perl require 5.006; use strict; use warnings; # Copyright (c) 2004 Barry Kaminsky. All rights reserved. This program # is free software; you can redistribute it and/or modify it under the # same terms as Perl itself. # Last minute changes made May 2, 2004, 4:00 AM EST without much # testing # VarStructor is an alternative to Perl's reset function, which is # expected to be deprecated. It could also be used to print variables # and their values, including "my" variables. See comments at top of # sub VARSTRUCTOR for configuration information. ################################# # Test variables (could be deleted) %hash1=("key1"=>"value1","key2"=>"value2"); $hash2{"key1"}="value1"; $hash2{"key2"}="value2"; $Simple_Var = 'simple'; @Simple_Arr = ('simple1','simple2'); ################################# &VARSTRUCTOR('show', 'E: subs(VARSTRUCTOR)'); # Test parameters # my @unique = unique_elements @array; # # Return a sorted list of unique values out of all values passed in as # parameters. sub unique_elements { my %seen; return sort grep { ! $seen{$_}++ } @_; } sub VARSTRUCTOR { ########################################################## # 1st parameter: # Assign "show" to print variables and values or # "clear" to clear variables. For security reasons, the # default is clear. # set up default action my $action = 'clear'; # see if first arg is "show" or "clear" if ( $_[0] =~ /^\s*(show|clear)\s*$/i ) { shift; # ok, get rid of it $action = lc $1; } # 2nd parameter: # Comma-separated list of variables, subroutines, and # labels, whose variables will be included or excluded. # Labels must be of labeled blocks that are wrapped in # braces. This parameter must begin with "I:" or "E:" # (include or exclude). # # You can't include or exclude array elements, hash # keys or hash values. Legal variables for this # parameter begin with $, @, and %, followed by a # string of word characters not beginning with a digit. # # The label and subroutine name lists must be enclosed # in separate sets of parentheses, following the word # "labels" or "subs". A comma after the closing # parenthesis is necessary when another item in this # parameter follows. Commas also must separate the # labels and subroutine names within the parentheses. # Within the code to be parsed, there must be nothing # preceding the labels and the "sub"s on the # same line except for optional spaces, and the # subroutines and labeled blocks must end with the # next "}" that is at the same level of indentation as # the first character of the label or the "s" of "sub". # Within the parameters, the "&" is optional before # subroutine names and the ":" is optional after labels. my $include = 0; my $vars = ''; if ( $_[0] =~ /^\s([iIeE]):\s*(.*)$/) { shift; # get rid of it $include = 1 if lc $1 eq 'i'; $vars = $2; } # 3rd parameter: # Target file. Default is $0, indicating the file # VarStructor is being run from. # last arg is file name to read the source code from. defaults to # using the currently-executing script ($0). my $in_file_name = shift || $0; ########################################################## my $file = do { open my $fh, $in_file_name or die "opening '$in_file_name': $!"; local $/ = undef; # enter slurp mode <$fh>; }; # delete whole-line comments $file =~ s/^\s*#.*$//mg; ### Prevent parsing of some quoted strings by deleting here ### docs. Rarely, a single quoted string would be mistaken for a ### variable if not in a here doc. # Delete here docs with quoted identifiers $file =~ s/<<\s*('|"|`) ([^\n]*?([^\\]|[^\\]\\\\))\1 # Match here doc identifier, which # ends with an unescaped closing # quote. Limitation: an even number # of slashes greater than two at # the end of the identifier would # be wrongly interpreted as # escaping the quote and the here # doc value would probably not get # deleted. .*?\n\2\n//sgx; # Delete here docs with unquoted identifiers # (tkil) greediness will kill you here $file =~ s/<<(\w+);.*?\n\1\n//sg; # accumulators my $subs = ''; my $labels = ''; my $out_vars = ''; my @all_vars; my @hash_disp; # Isolate subroutines to search, according to $vars # look for subexpressions like: "subs ( my_sub_1, my_sub_2 )" # remove it from $vars, then add each subroutine to output. while ($vars =~ s/ subs \s* \( ( [^\)]+ ) \) \s* //x ) { my $subs = $1; foreach my $sub ( map quotemeta, split /,\s*/, $subs ) { # find the subroutine in the file, save the definition. $file =~ m! ( # entire expression in $1 ^ (\s*) # save indent for later ma +tching sub \s+ $sub \s*# actual decl start (?: \( [^\)]+ \) ) # skip prototypes \s* \{ # opening brace .*? # whatever (but note non-g +reedy) ^ \2 \} ) $ # look for brace at same i +ndent !xsm or die "couldn't find definition of sub '$su +b'"; # save the definition we just found $subs .= $1; } } # Isolate labeled blocks to search, according to $vars this time, # look for: labels( ... ) and remove every match from$vars as we # find them: while ($vars =~ s/ labels \s* \( ( [^\)]+ ) \) \s* //x ) { my $labels = $1; foreach my $label ( map quotemeta, split /,\s*/, $labels ) { $file =~ m! ( # whole expression ($1) ^ (\s*) # capture label indent ($2 +) $label: \s*? (^ \s*)? # capture loop indent, may +be ($3) (?: do | for(?:each)? \(.*?\) | while \(.*?\) | until \(.*?\) | ) \s*? \{ .*? # actual block contents ^ ( \2 | \3 ) \} # closing brace at either lab +el # or loop indent level !smx or die "couldn't find label '$label'" . " in '$in_file_name'"; # save it for later $labels .= $1; } # Delete or include individual variables, according to $vars while ($vars =~ s/([\$\@\%][^\d\W]\w*)//x) { my $this_var = $1; if ( $include ) { $out_vars .= "$this_var='';"; } else { $file =~ s/\Q$this_var\E//g; } } if ( $include ) { $file = "$out_vars $subs $labels"; } else { # (tkil) this looks completely bogus. WTH? $file =~ s/\Q($subs|$labels)\E//; } # Find arrays. If not an array used in push, require an equals sig +n # to avoid quoted email addresses that look like arrays. while (($file =~ s/([^\\]|[^\\]\\\\)(\@[^\d\W]\w*)\s*=//)|| # Find scalars/array elements after ++ or -- ($file =~ s/(?:[^\\]|[^\\]\\\\)(\+\+|--)\s*(\$[^\d\W]\w*(\[ +.*?\])?)//)|| # Find scalars/array elements before assignment operators, +"++", # "--", "." or ".=" ($file =~ s/([^\\]|[^\\]\\\\)(\$[^\d\W]\w*(\[.*?\])?)\s*(=| +\+=|-=|\*=|\/=|\%=|\*\*=|\+\+|--|\.)//)|| # Find arrays assigned to with push. ($file =~ s/push[^\w_]*([^\\]|[^\\]\\\\)(\@[^\d\W]\w*)//)) { my $this_var = $2; $this_var =~ s/^\$(.*)\[.*/\@$1/; # Convert element to its arr +ay my $eval_var; if ( $action eq 'clear' ) { $eval_var = $this_var =~ /^\$/ ? "$this_var='';" : "$this_ +var=();"; } else { # To do: print index numbers next to values $eval_var = "\\$this_var = $this_var\\n"; } push (@all_vars, $eval_var); } # Extract hashes while (($file =~ s/([^\\]|[^\\]\\\\)\%([^\d\W]\w*)\s*=//)|| ($file =~ s/([^\\]|[^\\]\\\\)\$([^\d\W]\w*)\{[^\n]*\}\s*=// +)) { my $this_hash = $2; if ( $action eq 'clear' ) { push @hash_disp, "\%$this_hash=();" } else { push @hash_disp, "print \"\\n\%$this_hash\\n\";" . "foreach \$key (sort(keys \%$this_hash))" . "{print \$key, '=', \$$this_hash\{\$key\}, \"\\n\";}" } } my $all_vars = join '', unique_elements @all_vars; if ( action eq 'show' ) { $all_vars =~ s/.*/print"$&";/; } eval $all_vars; my $hash_display = join '', unique_elements @hash_disp; eval $hash_display; }

Replies are listed 'Best First'.
Re: Re: VarStructor 1.0
by demerphq (Chancellor) on May 04, 2004 at 11:28 UTC

    I agree with all this pretty much. I have to admit I wonder at wassercrats thinking on occassion. He doesnt seem to learn when the mule kicks other people, and its not even clear if he learns when the mule kicks him. I dont get this at all. Most people I know learn from other peoples mistakes. *shrug* Its his life I guess.

    But I do have a nit: You said ^ (and, for that matter, $) are not special inside of character classes. which is wrong. ^ is special inside of char classes when the it is the first character of the char class. It cause the char class to be negated.


    ---
    demerphq

      First they ignore you, then they laugh at you, then they fight you, then you win.
      -- Gandhi


      ^ is special inside of char classes when the it is the first character of the char class. It cause the char class to be negated.

      D'oh. Perfectly true, of course. Where's my brown paper bag?

        I still haven't finished reviewing everything, but I wanted to thank you for reviewing my code in such depth. In Chatterbox I mentioned the things I'm going to look into, but I guess you already fixed those things in the code.

        I'll just make a few points here:

        I used to format my comments to a certain width, but that makes them more difficult to edit and artificially increases the line count, so I only format certain comment blocks that way, such as ones most likely of interest to others.

        I'm going to put back much of my own style for less practical reasons than the comments--just because it's my style. I won't be using all those x modifiers and extra spaces, I'll change the indentation, etc.

        You added variables, such as $out_vars and changed lots of other stuff without clearly indicating where the changes were made. I already wrote a mostly working script. I don't want to have to learn it all over again as if it's not mine, and I need to check the changes.

        I like my parameter handling. It makes it easier on the programmer because he could add just the parameters he wants without extra commas. I see you did something with grep and shift that I can't figure out off hand (forgot how they work), but all I'll be changing is the "e:" and maybe "i:" indicators, maybe to "include:" and "exclude:", to avoid a file path from being confused for the second parameter.

        In LinkStructor, I generally use upper case for variables that are not configurable by the user and This_Style for those derived from the interface, so shouting or not, I might continue doing that for consistency in my own little world.

        I'm not afraid of saying "if you don't do it this way, my utility won't work," in some cases. It makes label and sub detection easier, and my rules aren't so unreasonable. The ability to omit and include those blocks when clearing or listing variables isn't offered in any other product as far as I know, so limitations aren't so bad, though I see you replaced my sub detection routine with your own. I guess I'll take a look at it and maybe use it.

        You've come close to making the fears I expressed here come true: "One of the reasons I'd be reluctant to release a subroutine of my script as a Perl module is that it would probably be pounced upon and improved beyond recognition and beyond my ability to work with it any longer, and I wouldn't want to lose my first baby."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://350259]
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.