http://www.perlmonks.org?node_id=164033

ajt has asked for the wisdom of the Perl Monks concerning the following question:

More Learned Monks I seek wisdom!

I've recently been playing about with RSS News Feeds (How do I clean RSS feeds to make them usable?). Overall I've found that they are easy and fun to deal with, but I found that the code I was working with could be "modulised".

I've only ever designed one module before, Re: Re: Simple Configuration module - now much improved, and never a OO-module. I've got TheDamian's exellect ISBN 1884777791/Manning Link, which I've read several times.

When I read the book, and other OO Tutorials, it all seems to make perfect sense, and I have no problem with using OO-modules, I'm just not convinced I know enough to make one.

I've got all sorts of questions, and I'm sure that there is guidance that can help me resolve my problems. Here are some of the problems I see with my design:

As ever humble thanks in adavnce...

Here is my first stab. I have example files, and I'm working of test stuff, naturally. SCARY prototype code to follow:

# -------------------------------------------------- # # XML::RSS::Tools # Version 0.01 # April 2002 # Copyright iredale Consulting, all rights reserved # http://www.iredale.net/ # # -------------------------------------------------- # -------------------------------------------------- # Module starts here... # -------------------------------------------------- package XML::RSS::Tools; use 5.006; use strict; use warnings; use Diagnostics; use Carp; #use HTTP::GHTTP; # Delayed loading #use LWP::UserAgent; # In case you don't have GHTTP use XML::RSS; # Handle the RSS/RDF files use XML::LibXML; # Hand the XML file for XSL-T use XML::LibXSLT; # Hand the XSL file and do the XSL-T use HTML::Entities; # Try and fix entities require Exporter; our $VERSION = '0.01'; our @ISA = qw(Exporter); # Preloaded methods go here. #{ # Class Data #} # # Tools Constructor # sub new { my $self = shift; my %args = @_; bless { _rss_version => $args{version} || 0.91, _debug => $args{debug} || 4, _xml_string => "", _xsl_string => "", _output_string => "", _transformed => 0, _auto_wash => $args{auto_wash} || 1, }, $self; } # # Output what we have as a string # sub as_string { my $self = shift; my $mode = shift; if ($mode) { if ($mode =~ /rss/i) { croak "No RSS File to output" unless $self->{_rss_string}; return $self->{_rss_string}; } elsif ($mode =~ /xsl/i) { croak "No XSL Template to output" unless $self->{_xsl_stri +ng}; return $self->{_xsl_string}; } else { croak "Unknown mode: $mode"; } } else { croak "Nothing to output..." unless $self->{_transformed}; return $self->{_output_string}; } } # # Set/Read the debug level # sub debug { my $self = shift; my $debug = shift; $self->{_debug} = $debug if defined $debug; return $self->{_debug}; } # # Set/Read the auto_wash level # sub auto_wash { my $self = shift; my $wash = shift; $self->{_auto_wash} = $wash if defined $wash; return $self->{_auto_wash}; } # # Get the RSS Version # sub get_version { my $self = shift; return $self->{_rss_version}; } # # Set the RSS Version # sub set_version { my $self = shift; my $version = shift; $self->{_rss_version} = $version if defined $version; return $self->{_rss_version}; } # # Load an RSS file, and call RSS conversion to standard RSS format # sub rss_file { my $self = shift; my $file_name = shift; _check_file($file_name); open SOURCE_FILE, "<$file_name" or croak "Unable to open $file_nam +e for reading"; $self->{_rss_string} = _load_filehandle(\*SOURCE_FILE); close SOURCE_FILE; _parse_rss_string($self); $self->{_transformed} = 0; } # # Load an XSL file # sub xsl_file { my $self = shift; my $file_name = shift; _check_file($file_name); open SOURCE_FILE, "<$file_name" or croak "Unable to open $file_nam +e for reading"; $self->{_xsl_string} = _load_filehandle(\*SOURCE_FILE); close SOURCE_FILE; $self->{_transformed} = 0; } # # Load an RSS file via HTTP and call RSS conversion to standard RSS + format # sub rss_uri { my $self = shift; my $uri = shift; $self->{_rss_string} = _http_get($uri); _parse_rss_string($self); $self->{_transformed} = 0; } # # Load an XSL file via HTTP # sub xsl_uri { my $self = shift; my $uri = shift; $self->{_xsl_string} = _http_get($uri); $self->{_transformed} = 0; } # # Parse a string and convert to standard RSS # sub rss_string { my $self = shift; my $xml = shift; croak "Nothing to parse" unless $xml; $self->{_rss_string} = $xml; _parse_rss_string($self); $self->{_transformed} = 0; } # # Import an XSL from string # sub xsl_string { my $self = shift; my $xml = shift; croak "No XSL-T" unless $xml; _$self->{_xsl_string} = $xml; $self->{_transformed} = 0; } # # Do the transformatoin # sub transform { my $self = shift; croak "No XSL-T loaded" unless $self->{_xsl_string}; croak "No RSS loaded" unless $self->{_rss_string}; croak "Can't transform twice without a change" if $self->{_transfo +rmed}; my $xslt = XML::LibXSLT->new; my $xml_parser = XML::LibXML->new; $xml_parser->keep_blanks(0); my $source_xml = $xml_parser->parse_string($self->{_rss_string}); + # Parse the source XML my $style_xsl = $xml_parser->parse_string($self->{_xsl_string}); + # and Template XSL files my $stylesheet = $xslt->parse_stylesheet($style_xsl); + # Load the parsed XSL into XSLT my $result_xml = $stylesheet->transform($source_xml); + # Transform the source XML $self->{_output_string} = $stylesheet->output_string($result_xml); + # Store the result $self->{_transformed} = 1; } # --------------- # Private Methods # --------------- # # Parse the RSS string # sub _parse_rss_string { my $self = shift; my $xml = $self->{_rss_string}; $xml = _wash_xml($xml) if $self->{_auto_wash}; my $rss = XML::RSS->new; $rss->parse($xml); if ($rss->{version} != $self->{_rss_version}) { $rss->{output} = $self->{_rss_version}; $xml = $rss->as_string; } $self->{_rss_string} = $xml; } # # Load file from File Handle # sub _load_filehandle { my $handle = shift; my $content; while (<$handle>) { $content .= $_ } return $content; } # # Wash the XML File of known nasties # sub _wash_xml { my $xml = shift; my %entity = ( trade => "&#8482;", ); my $entities = join('|', keys %entity); decode_entities($xml); +# Try and deal with known entities $xml =~ s/&($entities);/$entity{$1}/g; +# Deal with odd entities $xml =~ s/&(?!(#[0-9]+|#x[0-9a-fA-F]+|\w+);)/&amp;/g; # + Matt's ampersand entity fixer $xml =~ s/\s+/ /gs; $xml =~ s/> />/g; return $xml } # # Check that the requested file is there and readable # sub _check_file { my $file_name = shift; croak "No file name supplied" unless $file_name; croak "Cannot find $file_name" unless -e $file_name; croak "$file_name isn't a real file" unless -f _; croak "Cannot read $file_name" unless -r _; croak "$file_name is zero bytes long" if -z _; } # # Grab something via HTTP # sub _http_get { my $uri = shift; my $xml; eval { # Try and use Gn +ome HTTP, it's faster require HTTP::GHTTP; }; if ($@) { # Otherwise use L +WP require LWP::UserAgent; my $ua = LWP::UserAgent->new; $ua->agent("iC-XML::RSS::Tools/$VERSION " . $ua->agent . " ($^ +O)"); my $response = $ua->request(HTTP::Request->new('GET', $uri)); croak "HTTP error: " . $response->status_line if $response->is +_error; $xml = $response->content(); } else { my $r = HTTP::GHTTP->new($uri); $r->process_request; $xml = $r->get_body; croak "HTTP error: Unable to connect to server.\n" unless $xml +; my ($status, $message) = $r->get_status; croak "HTTP error: $status, $message\n" unless $status == 200; } return $xml; } 1; __END__ =head1 NAME XML::RSS::Tools - Perl extension for very high level RSS Feed manipula +tion =head1 SYNOPSIS use XML::RSS::Tools; my $rss_feed = XML::RSS::Tools->new; $rss_feed->rss_uri('http:://foo/bar.rdf'); $rss_feed->xsl_file('/my/rss_transformation.xsl'); $rss_feed->transform; print $rss_feed->as_string; =head1 DESCRIPTION RSS/RDF feeds are commonly available ways of distributing the latest n +ews about a given web site for news syndication. This module provides + a VERY high level way of manipulating them. You can easily use LWP, +the XML::RSS and XML::LibXSLT do to this yourself. =head2 EXPORT None by default. =head1 AUTHOR Adam Trickett, E<lt>adam@iredale.net<gt> =head1 SEE ALSO C<perl>, C<XML::RSS>, C<XML::LibXSLT>, C<LWP> and C<HTTP::GHTTP>. =head1 COPYRIGHT XML::RSS::Tools, Copyright iredale Consulting 2002 This program is free software; you can redistribute it and/or modify i +t under the terms of the GNU General Public License as published by t +he Free Software Foundation; either version 2 of the License, or (at +your option) any later version. This program is distributed in the hope that it will be useful, but WI +THOUT ANY WARRANTY; without even the implied warranty of MERCHANTABIL +ITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public L +icense for more details. You should have received a copy of the GNU General Public License alon +g with this program; if not, write to the Free Software Foundation, I +nc., 59 Temple Place - Suite 330, Boston, MA 02111, USA. =cut

Replies are listed 'Best First'.
Re: Thoughts On Object-Oriented Module Design. Input Sought.
by Ovid (Cardinal) on May 04, 2002 at 19:11 UTC

    ajt, congrats on taking the dive into object-oriented programming. I wish I had thought to ask these questions when I first started to learn. Regarding your questions, I would assume that the programmer using your module is competent. Making too many assumptions that your programmer is a weak programmer can bloat your code. However, you still want to provide the target programmer with enough utility and information that should he or she make a mistake, it's still easy to figure out what's going on. A classic example is one of my first large OO projects. For non-recoverable errors, I had die statements. Naturally, they died when they were called incorrectly, which provided the programmer with virtually no information about the context in which the offending method died. croak would have been a much better tool as it dies from the perspective of the caller, not the callee.

    Here's my take on some of your questions.

    Should my module's methods croak, or return error codes when they fail?

    What happens when a method returns? Is it an error that the programmer can reasonably recover from? Failure to read a config file is typically fatal. Finding out that there are no customers with the first name of 'Frank' probably is recoverable. Croak or return an error code depending upon whether or not the program should terminate.

    How much internal error checking is sensible? Some external calls will fail anyway if I pass duff data.

    I like to use plenty of error checking. As with the answer to your first question, what happens if there is an error? Is it recoverable? You might want to check out Class::Contract. That allows you to use Design by Contract programming for easy testing of pre- and post- conditions (and other things) and has a Class::Contract::Production version that turns off this feature in production in order to enjoy better performance.

    Should I use generic methods that can both get and set, or is a single one each, e.g. $o->debug or $o->set_debug(1); print $o->get_debug?

    I no longer like generic methods. Here's why:

    $obj->foo( 'bar' ); $value1 = $obj->foo; $obj->baz( 'qux' ); $value2 = $obj->baz;

    Did you spot the error in the above code? Syntactically it's correct, but what if $obj->baz is only an accessor and not a mutator? There is no way to tell, from the code, that you can't use this method to set a property. If most of your methods are doing double duty, but a couple are just accessors, a programmer might easily get confused as to which are which. In my code, you'll often see something like this:

    $products = $obj->get_product; $product = $obj->get_product( $product_id );

    In that example, the first use returns all products and the second returns only the information for a given product. That, I think, is a clean way to overload methods. However, if you tend to use the same method to both get and set a property, the above code might confuse. That's why I explicitly put a get_... at the beginning of the method name so that no one can misunderstand.

    If I want to submit it to CPAN, how do I go about it? I know CPAN has instructions, but USENET seems a scary place, and I don't know if the code is worth anything to anyone else?

    Read tachyon's How to make a CPAN Module Distribution.

    What works best, lots of methods to set things, or via the constructor in one go?

    What's the common case? Are people likely to change the foo property? If not, don't use a mutator. I'd push it into the constructor. If it's an optional property, though, you'd want a mutator. However, that's too simple of an answer; it can vary from module to module.

    Code comments:

    sub new { my $self = shift; my %args = @_; bless { ... }, $self; }

    By convention, $self is used to refer to an object. Since you don't yet have an object in your constructor, I'd rename this to $class or $proto.

    sub as_string { my $self = shift; my $mode = shift; if ($mode) { if ($mode =~ /rss/i) { ... }

    Before the first if statement, I would use $mode ||= ''; or $mode = '' if ! defined $mode; (the second is used if a false value for $mode is legal and you don't want it overwritten). This will suppress warnings on your regular expressions if $mode is not defined.

    $self->{_xsl_string} = _http_get($uri);

    That's a tempting thing to do, but don't. If you have an OO system, you may as well take full advantage of it:

    $self->{_xsl_string} = $self->_http_get($uri);

    How does that help? Well, what if you have an error property in your object? If you just call the _http_get function, you can't conveniently set that error property, if needed. Further, what if you need to extend that subroutine to call helper methods. Since you no longer have $self, you can't call its methods and you've limited what you can do (some will quibble with this point as there are plenty of exceptions).

    Putting all of the above comments aside, I can only say that I wish my first OO module was half as well-done as yours :)

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Thoughts On Object-Oriented Module Design. Input Sought.
by vladb (Vicar) on May 04, 2002 at 19:17 UTC
    Interesting post. Let me hereby attempt to tackle a few of your questions.
    1. Should my module's methods croak, or return error codes when they fail?
      This really depends. Generally It's ok for a module method to die if its life becomes unbearable due to a serious bug etc. To remain consistant, I suggest you stick with Croak.

    2. How much internal error checking is sensible? Some external calls will fail anyway if I pass duff data
      You should always do sanity checks. Check subroutine parameter count and type (scalar/hashref/array/...?) and die on invalid use. However, I remember coming across pieces of code that did too much of this kind of check. So, in the end, you have to keep your sanity checks to a level that is sane ;).

    3. Should I use generic methods that can both get and set, or is a single one each, e.g. $o->debug or $o->set_debug(1); print $o->get_debug?
      No, the way you do it now is great. That's what I'd do anyways ;). In the end, it's always easier to deal with just one method rather than two. Say, why should anyone keep two get_ and set_ methods when a context (lvalue etc.) in which a method is called is clear enough indication what the method should do (either of the two: get or set an object attribute).

    4. If I want to submit it to CPAN, how do I go about it? I know CPAN has instructions, but USENET seems a scary place, and I don't know if the code is worth anything to anyone else?
      I think by posting here you are already one step closer to getting your module on CPAN. Further, you may want to seek some advice on the name space you are using for your package. I believe there are special discussion groups/forums dedicated to just that ;).

    5. What works best, lots of methods to set things, or via the constructor in one go?
      In case of accessor methods, it's good to keep them regardless of whether your constructor is able to accept initial object settings. However, my suggestion would be to allow as much stuff initialized through your constructor method as possible. Otherwise, a user of your module would have to call 100 and 1 methods to properly initialize an instance of your RRS::Tools class before even being able to do any work on it. I suggest you don't make it a necessity by allowing as much (initialization) work to be done through the constructor as possible.

      Whew... well, that was my 3.2 cents there.

      "There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith

      1. Should my module's methods croak, or return error codes when they fail?

      I generally feel that it is unhelpful for an object to take down an application because it's unhappy about its own state.

      Why? If you view objects and being bits of data able to be acted upon, the absence or presence of that data is not necessarily an event worth 'dieing' over. Particularly in the case of objects (which are liable to be imported and used by others) you should leave the choice of what to do up to the application developer.

      This is what's beautiful about Java's try{} catch{} syntax -- errors can percolate up to whatever level you care to let them, so errors can be trapped at any point of execution or simply cause the application to fail depending on the choices made by the developer.

      If you feel uncomfortable doing this, then a good backup plan is to have a flag available to the developer -- say, isErrorFatal() -- that allows the developer to choose whether or not a failure within the object is fatal to the application.

      This is, of course, a generalization, but I'd go with returning undefined and setting an error flag for all but the most serious cases.

      2.Should I use generic methods that can both get and set, or is a single one each, e.g. $o->debug or $o->set_debug(1); print $o->get_debug?

      Here I have to respectfully disagree with vladb. Beyond the reasons outline by Ovid (what if your method doesn't actually *accept* input), are issues with flexibility and maintainability.

      What if I want to create a wrapper object that only exposes the getter features of the RSS class? This contract is a lot more clear when there are separate methods to perform each action.

      From a maintainability standpoint I also think that the clearer your method names the less likely you are to have problems down the line when you want to re-write the class or update a specific feature. Compare looking for what needs to change when you have a method called getDatabaseConnection() versus one just called connection().

      More clarity is almost always better.

Re: Thoughts On Object-Oriented Module Design. Input Sought.
by gloryhack (Deacon) on May 05, 2002 at 22:04 UTC
    I don't put things into CPAN, but I prefer the following:
    1. Never let a module kill the caller. Instead, return undef to indicate error, and, if it seems like it might be needed, provide another method that returns a descriptive error string. This adds to the bulk of the code, but I think it's worth it. I could be wrong.
    2. Offload input taint checking to the calling application; within the module's methods, sanity check only. Some methods will be more robust if they do slightly more extensive testing of input data; a flexible solution is to provide testing methods outside of your setter/getter methods. This allows the user to work it: "If the input is okay, then set it in the module." Later, if you just can't stand the risk that some user is going to hose the thing with unchecked data, you can add the call to the testing method to the setting method.
    3. I prefer combined setter/getter methods if both actions are simple. Otherwise, I prefer set_foo() and get_foo()
    4. Regarding constructor vs. "lots of methods", I prefer to let users of my modules work in whatever way suits them best, so I generally try to support sensible defaults, overridden by any arguments to the constructor, and setter methods accessible to the user. The defaults and overrides actually call the setter methods to ensure consistency and stick more closely to the OOP.

      I try to avoid forcing my will on users... TMTOWTDI and my way is usually just plain dumb.
Re: Thoughts On Object-Oriented Module Design. Input Sought.
by dsheroh (Monsignor) on May 06, 2002 at 15:58 UTC
    My thoughts...
    Should my module's methods croak, or return error codes when they fail?
    Error codes. Over the years, I've run into more libraries than I care to think about (mostly C, not perl) which die and take my app down with them when they think something went wrong. I don't care if your code absolutely, positively has to have a certain database available in order to function - return an error code so that the caller can then decide to use an alternate implementation, create the database you need and try again, or just die if nothing else works. Unless you know that there is no way the caller could possibly cure your 'fatal' condition, just tell it. Don't croak without warning.

    Side question: I'm currently debating with myself over when it's appropriate for a method to report an error to STDERR on its own instead of just passing back an error code to the caller and letting them decide whether to report it or not. Thoughts?

    How much internal error checking is sensible?
    Sanity checking is always good. Don't make assumptions in your code - if it's possible for you to be passed bad data, make a reasonable effort to ensure that you didn't.

    Should I use generic methods that can both get and set, or is a single one each
    I go with generics. In my experience, that seems to be the more common perl idiom.

    What works best, lots of methods to set things, or via the constructor in one go?
    Both!
      Side question: I'm currently debating with myself over when it's appropriate for a method to report an error to STDERR on its own instead of just passing back an error code to the caller and letting them decide whether to report it or not. Thoughts?

      I believe that most of the time, methods within a module should not screw around with their environment. However, if reporting to STDERR is seen as convenient, I prefer a module that will provide a switch to turn STDERR reporting on or off, defaulting to the off state.

      When I encounter a module that insists upon spewing at STDERR, I add to my own code a method for redirecting STDERR to /dev/null, and turn it on at the same point at which I comment-out the uses of the strict and diagnostics pragmas.

      I think that by providing the switch in your modules, you allow your user to operate more completely within TMTOWTDI space.