Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Note: I'm not saying "don't use if." This node is all about avoiding if to check the type of data you have in OO code, even though there are times that it's unavoidable.

if is a fascinating keyword. Essentially, it says "I don't know what I have, so I'm going to guess." This guessing leads to subtle bugs. Do you see any potential problem in the following?

if ($x and ! $y) { ... } elsif (!$x and $y) { ... } else { ... }

The above code might not have a bug, but it's interesting to note that it does not explicitly deal with the case of both variables being true or both variables being false. Both cases will receive the same treatment. This can raise a maintenance problem if another programmer has to distinguish between these two cases and isn't aware of this snippet.

Most programmers generally do not sit down and draw out truth tables every time they want to code a conditional, but I hardly feel this is a problem. Unexpected cases come up all the time in programming and if is hardly the only culprit. However, when dealing with object oriented code, "if" statements are frequently a maintenance nightmare that's symptomatic of bad OO design.

A classic example lies in older versions of HTML::TokeParser::Simple (I wrote it, so I can pick on it.) In this module, I needed to maintain "drop-in replacement" compatability with my base class, HTML::TokeParser. To do this, I blessed the tokens that the latter module returned into a new token class. Then I had a problem. The array references returned from get_tag() and get_token() methods in HTML::TokeParser had subtly different structures. My methods should not care which method generated the token, so I wrote the following ugly hack:

1 sub _synch_arrays { 2 # if the method is called from a token generated by the get_ta +g() method, 3 # the returned array reference will be identical to a start or + end tag 4 # token returned by get_token() *except* the first element in +the reference 5 # will not be an identifier like 'S' or 'E' 6 7 my $array_ref = shift; 8 my $tag_func = GET_TOKEN; 9 10 unless ( grep { $array_ref->[0] eq $_ } keys %token ) { 11 # created with get_tag() method, so we need 12 # to munge the array to match the get_token() array 13 # After this is called, and before the method returns, you + must 14 # use something like the following: 15 # shift @$self if $method == GET_TAG; 16 $tag_func = GET_TAG; 17 if ( '/' ne substr $array_ref->[0], 0, 1 ) { 18 unshift @$array_ref, 'S'; 19 } 20 else { 21 unshift @$array_ref, 'E'; 22 } 23 } 24 return ( $array_ref, $tag_func ); 25 }

This helper function would synchronize the returned array references to ensure that my methods could operate on one data structure rather than two. Unfortunately, I always had to remember to "unsynchronize" them if the calling function was get_tag(). If I didn't, users who tried to use my code as a drop-in replacement for the parent class would find the get_tag() method returning an array ref with an extra element at the front.

Line 10, a disguised if, is the first indicator that I have a major design flaw. This line checks to see if the first array element exists in my global %tokens hash. There are two problems here. The global variable isn't really that bad because it's effectively a private, read-only class variable (but it should have been uppercased). I should have wrapped that in an accessor, but I didn't. The major problem lies in the fact that I'm trying to determine what type of token I have. Lines 18 and 21 are where I unshift the type of token onto the front of the array. This made the rest of my code simpler, but synchronizing the arrays meant that I sometimes had to unsynchronize them and this made my code very difficult to maintain. I was constantly breaking it while adding new features.

A perfect example is this ugly method:

1 sub as_is { 2 my ( $self, $method ) = _synch_arrays( shift ); 3 my $type = $self->[0]; 4 my $text = $self->[ $token{ $type }{ text } ]; 5 shift @$self if $method == GET_TAG; 6 return $text; 7 }

The reason this works is because the array synchronization ensures that the "text" attribute is now always at the same index for a given type (line 4), but more than once I forgot to add line 5 which removed the extra element.

The latest version of this module no longer has this problem. Different token types? They're now subclasses of HTML::TokeParser::Simple::Token and I only test the type to bless the token into its appropriate class. Most of the new classes are very small and only override a tiny but critical portion of the base token class functionality. The data that I used to capture while synchronizing arrays is now captured when the token is instantiated and this has made the code much easier to work with. For example, the as_is() method is in my token subclass:

sub as_is { shift->[-1] }

However, "text" tokens have the raw text in a different spot, but the corresponding subclass overrides that method like this:

sub as_is { shift->[1] }

No longer do I need to synchronize arrays or do any fancy tricks. By allowing polymorphism to handle the logic for me, the code is much cleaner. Further, if you need special handling of particular HTML types, you can simply subclass the appropriate token class.

Interestingly, after I uploaded the latest version of this module to the CPAN, I grepped for if statements in the code and I saw lines like this:

lib/HTML/TokeParser/Simple/Token/Tag.pm: return [] if END_TAG eq $INSTANCE{$self}{type};

Instantly I realized the flaw. I only have one "tag" type. If a programmer subclasses this, she's going to override the behavior of both start tags and end tags. She'll need to test which type it is to prevent bugs and this is the sort of maintenance issue I wanted to avoid. Just seeing this keyword in my code instantly pointed out a design flaw. Now I just need to find the time to fix this :)

Cheers,
Ovid

New address of my CGI Course.


In reply to "if" Considered Harmful in OO programming by Ovid

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-03-29 09:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found