in reply to An exercise in "That's not the way I'd do it" (TNTWIDI)
I have a few suggestions for improvement to your code, I'll outline the general ones then get more specific.
These are based on my personal opinion, please feel free to review my review.
IMHO, a general rule is that the data type you use should closely represent the way you want to use the data in your code. In your code, I would would "use vars" for variables that are global in nature, but that are still going to be modified by your code. Constants are things which should never change, and will stay the same through-out the execution of your program. I believe you should "use constant" for this type of data. For example, your defaults at the top of the script should be constants, not global variables..
The data source is the root of all data structures. The data structure you use has different names than the data source. Why bother to remember that "last" maps to "LastReputation" in the database? If you change the names in your data structure to match your database column names, your code becomes simpler.
Calling the same thing by two different names is sure to cause confusion later on.
I noticed you asked for comments regarding readability. Often when I am not sure, I refer to perlstyle for style guidelines. It's not perfect, but it seems to cover the general consensus of the perl community.
Also, another tip that I use when trying to name my object variables, I refer to the module's docs for their naming conventions. For example, say $sth to anyone who's worked with DBI and they'll instantly know what you are talking about. Say $r and anyone who's work with mod perl will tell you it's the Apache Request object. About the only place I break this is with CGI, I don't like using $q and prefer to use $cgi, I think it's more self describing.
It's usually safer to do checks on hashes using exists and/or defined, rather than just doing the equivalent of a if($hash{key}) {}. You'll eliminate some possibility of unitilized value warnings.
When defining a hash or array, and you mean for it to be empty, it's redundant to do this: my @array = ();, when you can just do: my @array;. This is more personal preference, but I believe it's more readable without this redundancy.
I notice that you tend to use if(! $test) {} instead of unless ($test) {}. I usually try to stay away from these types of constructs, in general. It can get you into some nasty logic mind-benders, where you have code that's doing the equivalent of an english "double negative".
All through grade school my English teachers always told me not to do this, but I never really quite understood what they meant, until I began coding..
Inside your code there is a temporary variable, that has the prefix $out, which is the output variable that gets emailed and/or printed to STDOUT. I think it might be a better idea to do your formatting all in a single place, rather than mixing your logic and presentation code. You could have a hash that represents state, and place your data into it, then right before you send the mail or print, you run it through a filter to format it. I've used HTML::Template in an almost indentical situation as this, and it works wonderfully. This won't work very well is if the amount of data is large, but I believe it should in your case.
This seperation of logic and presentation allows you to output the data in any format you like, even HTML. And the resulting code becomes much cleaner and easier to understand.
- You should change the code so that it either uses a database or a flat file. Right now your code is always using a flat file, and sometimes using a MySQL database. Why not use one or the other? It would simplify your code somewhat, as you'd create a single $dbh database handle, and pass it around to your read_file(), write_file(), and db_update() routines. Your -d flag should allow people to decide between csv or MySQL, with the default being csv of course.
Keep in mind the above suggestions are snippets from my own personal style. While it's not the one true way, I've found it usually does not conflict with the general consensus on perlmonks and CPAN. As well, I've never been burned by the suggestions above or below - that usualy happens only when I ignore my own guidelines. =)
The place where you check $arg{d} you create a synthetic variable called @nodelist, then proceed to push the results of a loop onto it. You could have done a direct assignment, or even better not created the variable in the first place.
In the call to update_replist() I think it would be clearer if it returned a hash reference, equivalent to %dbreplist. IMHO, passing in a reference to something into a subroutine, only to have it be modified in the subroutine can have the same unintended side effects that people face when using global variables. In general, I think if you can help it, don't do it. In this case, there are a few places where it might be required, so place the hash reference as the final parameter to this subroutine, so it can be optionally passed in.
Here's the block using the suggestions above:
if(exists $arg{I}) {
my $hreplist = initialize_rep_file($username, $password, $filename);
db_update( update_replist('I', $hreplist, [keys %$hreplist]) )
if exists $arg{d};
exit;
}
read_file/write_file:
Both of these routines do things that are already in DBD::CSV. Granted, there is a little bit of increased over-head, but the code inside the module is documented and standardized.
Using DBD::CSV read_file becomes: (Note, the name should probably be changed to read_source or something similar)
sub read_file {
@_ == 2 or croak 'Incorrect number of parameters';
my $dbh = shift;
my $source = shift;
my $statement = sprintf(q{
SELECT NodeId
, Title
, Date
, LastReputation
, Reputation
FROM %s
},
$source,
);
my $sth = $dbh->prepare($statement);
$sth->execute;
my %nodehash;
while(my $row = $sth->fetchrow_hashref) {
croak "Node ID $row->{NodeId} is duplicated!"
if exists $nodehash{ $row->{NodeId} };
$nodehash{ $row->{NodeId} } = $row;
}
return \%nodehash;
}
Now if it is changed to using DBD::CSV, it might even be possible to factor out write_file(), and use dbupdate() (below) for both functions, but I will leave that for jcwren to decide.
dbupdate:
- The SQL query here should be written to use placeholders and be prepared outside of the loop.
- You can simulate the the or croak behaviour using the RaiseError parameter in DBI.
Here's an example of the resulting code:
sub db_update {
@_ == 2 or croak 'Incorrect number of parameters';
my $dbh = shift;
my $hreplist = shift;
my $statement = sprintf(q{
INSERT
INTO %s
(%s)
VALUES (%s)
},
DEF_DBTABLE,
join(', ', @{ &DB_COLUMNS }),
join(', ', ('?') x @{ &DB_COLUMNS }),
);
my $sth = $dbh->prepare($statement);
foreach my $nodeid (sort keys %$hreplist) {
$sth->execute(@{ $hreplist->{$nodeid} }{ @{&DB_COLUMNS} };
}
}
Re: (dkubb) Re: (2) An exercise in "That's not the way I'd do it" (TNTWIDI)
by tenya (Beadle) on Mar 20, 2001 at 01:11 UTC
|
Your comments were quite educational for me. However, I did not fully understand your point quoted below. What is the danger of modifying the reference in a subroutine, and what should you do instead? Thanks. "IMHO, passing in a reference to something into a subroutine, only to have it be modified in the subroutine can have the same unintended side effects that people face when using global variables. In general, I think if you can help it, don't do it."
| [reply] |
|
The claim about changing things by reference being similar to globals is one I understand but only halfway agree on. What is similar is action at a distance. However while globals by definition create that, modifying by reference can either increase or decrease.
Remember that the goal of programming well is to do the job, within your set constraints, while maximizing the maintainability of the code. From the point of view of getting something simple done, globals are always easiest. However they also open you to global side-effects, which increases the amount the programmer needs to keep in mind. In general if you have n sections of code, any of which can interact with any other, that is order n*n interactions to keep in mind. Which is bad.
However many people hear this and say, Oh, I won't use globals. But I will use a reference to something that I pass around everywhere and set and change as I need. Voila! No more globals!
But while you have removed the label, you have not changed the problem very much. The data is still being shared everywhere, it is just being shared by a different mechanism. You still have action at a distance. You still have the maintainability issue.
This is the sense in which I agree with dkubb's criticism.
However I only halfway agree. As a counter example in object oriented programming you everywhere pass around a reference to something, and you can modify it by reference anywhere. Yet good programmers who avoid globals are quite willing to use OO programming. What is different?
The difference is that OO puts the code that requests from the data structure together with the code that places it. Therefore all of the information about mechanics, internal counters, etc is encapsulated in a relatively small section of code. While requests may come from many places for many reasons, the mechanics of how to do it no longer suffer as much from action at a distance.
Now in practice that might not be true. And certainly it takes experience to produce good OO designs. But that is the principle.
But I should qualify all of this with one observation. Perl as a language has a ton of global variables. Good programmers avoid them. However, despite having a ton of globals, Perl is a rather effective language. How does that work?
IMHO what is going on is simple. The maintainance problem that globals cause is due to the increase in how much you need to keep in mind. Globals are almost always useful in terms of getting things working, but they bite you down the road. Therefore there is a balance between the theoretical evil and the practical utility. Every global is another thing to be constantly kept in mind, and only applies when working on that code. Therefore the cost of learning Perl's globals can be amortized across many programs. While the cost of keeping a global in your scripts can only be amortized across that script. So the tradeoff comes at a different point.
Even so I find that for me, Perl has gone too far. Most of the special variables in perlvar I do not use. My brain is too small to keep them in mind. At least they have names that cannot be confused with regular variables, so if someone else uses one I can look it up. But I don't use them. I prefer to use that mental energy for general principles, known bugs, and remembering my grocery list. :-)
| [reply] |
|
|