http://www.perlmonks.org?node_id=264916

Ovid has asked for the wisdom of the Perl Monks concerning the following question:

A little OO style question has been bugging me for a while. Consider HTML::TokeParser::Simple. Currently, if you want to alter attributes of start tags, you can do this:

$token ->delete_attr('foo') ->set_attr( bar => 'baz' ) if $token->is_start_tag;

Some appear to argue that the "is_start_tag" test is superfluous. Basically, mvc argued that having the test is irrelevent. If the object can do what I want, let it do it, otherwise it ignores the message. So how does that work out if I want to clean up all references to font tags in a HTML document?

while(my $token = $parser->get_token) { $token->del_attr('font'); print $token->as_is; }

Right now to do that, I have to do the following:

while(my $token = $parser->get_token) { $token->del_attr('font') if $token->is_start_tag; print $token->as_is; }

Currently, calling set_attr() or del_attr() on anything but a start tag will croak(). (That was a mistake that I will fix. It should carp and move on.) Other than that, how do you feel about the two styles above? My concern with the former style is the possibility a bug in the code leading someone to accidentally try to set attributes on something other than start tags and missing the start tags themselves. In other words, if the code doesn't alert you to the appropriateness of the message, how do you know if the right objects are getting the right messages?

My concern with the latter stems from adding extra tests in the code. The more code, the more bugs. Which style would you choose and why?

Cheers,
Ovid

New address of my CGI Course.
Silence is Evil (feel free to copy and distribute widely - note copyright text)

Replies are listed 'Best First'.
Re: OO style question: how much information to hide?
by BrowserUk (Patriarch) on Jun 11, 2003 at 01:07 UTC

    Taking the latter form to extremes, you could end up writing something like

    $token->del_attr('font') if $token->is_start_tag and $token->can_have_attr('font') and $token->has_attr('font');

    If the method can only be applied if certain condition(s) are true, better to have the test(s) once in the object and know that they will be performed, than to force every caller to perform the tests, and risk that they don't.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: OO style question: how much information to hide?
by perrin (Chancellor) on Jun 11, 2003 at 00:28 UTC
    There are exceptions to the rule, but in general objects should not silently fail. If you call $dog->quack() and it can't quack, it should die loudly to let you know you have made a horrible mistake. Warnings are for things like "that method call was superfluous" not "I am unable to do what you asked."

    UPDATE: I read the comments from mvc. He seems to have been simply saying that you should try for the highest possible abstraction of what you're trying to do. This is obviously good advice, but I don't think there was anything wrong with what you were doing in that example. This one is a harder call. In this case you're dealing with two instances of tokens with little reason to differentiate between them, and some start tags might not have any attributes either.

    I think you should recast your thinking like this: tokens can have attributes, deleting an attribute on a token that doesn't actually have that attribute is not considered an error, therefore deleting an attribute on an end tag is not considered an error.

(jeffa) Re: OO style question: how much information to hide?
by jeffa (Bishop) on Jun 11, 2003 at 03:37 UTC
    Seeing as how only start tags can have attributes [*], i think that the latter is more representative of the 'real world'. However, since this is a "Simple" module, then perhaps it would possible to check if $self really is a start tag inside of del_attr() and gracefully return if it is not? It could be a simple matter of performing that if statement for the client, which is nice as long as the client doesn't need to "grab a hook", which i don't think they will.

    In conclusion ... i say take care of it for the client, simply because the name is HTML::TokePaser::Simple.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    

      Hear, hear. Simple interfaces should ascribe to the Perl philosophy of DWIM. If a sensible user would say, "well, duh, of course I can't delete attrs from end tags, because they don't have any," then the module should just shield the user from this aggravation. Check for end-tags inside the appropriate *_attr calls and return undef.

      Since there's no data loss in deleting what's never there (and could never be there), just ignore it. I'd say the same thing of attaching data to things which can't accept attachments (attrs on end tags), though I might suggest that such a call return undef on failure and non-undef on success. The quietest possible error.

      If the user chooses a lint/debugging mode for the object or package, then you could carp a warning. But simple interfaces should say "no harm, no foul" to irrelevant calls.

      --
      [ e d @ h a l l e y . c c ]

        Amen++

        Though I'd like to here why you say "simple interfaces"?

        My first reaction is that all interfaces should be simple, but realistically that is always possible.

        Second thought was that in the interface is already complicated, then wheres the mileage in further complicating it by add extra methods that the caller has to use to test for stuff that can be tested internally just to avoid abending for a error that isn't?


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: OO style question: how much information to hide?
by webfiend (Vicar) on Jun 11, 2003 at 00:54 UTC

    The explicit delete ... if ... seems more readable to me, in a narrative sense. Somebody else taking a casual glance at the other chunk might be concerned that time was being spent deleting attributes from end tags. It wouldn't take them long to discover the truth of the matter, but it's still something in the back of their head.

    On the other hand, perhaps you want a method besides as_is() for printing purposes? Maybe something like this:

    sub cleaned_up { my $self = shift; $self->del_attr('font') if $self->is_start_tag; return $self->as_is; } ... print $token->cleaned_up;

    Of course, this example could be modified for the realities of style and ... well ... reality :-)

Re: OO style question: how much information to hide?
by Aristotle (Chancellor) on Jun 11, 2003 at 06:41 UTC

    I think some perspective is required here. I agree that "can you do it? then do it" protocols are adding superfluous logic. However you do want to take different actions depending on what kind of thing you have. It doesn't make sense to have every object accept every possible message ever but ignore all but a small set of them (because it can't handle the rest) - does it? mvc was arguing to decrease coupling, but he also conceded that the ultimately independent object is one that has no methods and is ultimately useless. What you really want to do if you are concerned about purity and want to loosen coupling is having the conditional be implicit in the $token's class. See evolving an OO design for this principle. Loose coupling is achieved by removing the condition from the calling code altogether.

    In the case in question, the correct approach would be to have a different class for every type of token, and to define a ->clean_up method for each of them. The one for a start tag would delete font attributes, in others it would be a no-op. Then, you just write:

    while(my $token = $parser->get_token) { $token->clean_up; print $token->as_is; # possibly: # print $token->clean_up->as_is; }

    Is that worth the effort? It depends on the scale of the project.

    I would like to argue in favour of enforcing the difference.

    That would necessitate "can you do it? then do it" for small scripts (which I believe are more readable for it). In large projects, it would be very helpful to have the strictness enforced as an extra debugging tool. If I mix up the types of objects at some point in the code, letting all of them accept any kind of message is not going to help me track down the error.

    If inconvenience for small scripts is a concern (though that isn't what you were asking about), I would possibly add a strict => 0 option to the constructor.

    Makeshifts last the longest.

Re: OO style question: how much information to hide?
by dws (Chancellor) on Jun 11, 2003 at 01:22 UTC
    The more code, the more bugs.

    And more unit tests, fewer bugs.

    I favor the explicit style. set_attr() and del_attr() should only work on start tags. To allow otherwise is to risk hiding problems.

Re: OO style question: how much information to hide?
by PodMaster (Abbot) on Jun 11, 2003 at 00:50 UTC
    Come now Ovid, you're over thinking it. All tokens must not have attr methods, cause all tokens do not have attributes. Divide and conquer. I say you just go with Re: Re: XML::TokeParser::Simple - pretty much like HTML::TokeParser::Simple like you said you would. Your start tags can have those attr methods, but your text/comment/end tokens surely must not.

    Perl should definetly die with Can't locate object method "del_attr" via package ... if you try that on a text token.


    MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
    I run a Win32 PPM repository for perl 5.6x+5.8x. I take requests.
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      Perl should definetly die with Can't locate object method "del_attr" via package ...

      I'm fine with dying if throwing exceptions is the documented error handling methodology and is used consistently.

      Sort of¹. . .

      As far as the "definetly" [sic] part of that statement... well, why?

      Gracefully handling the error isn't such a bad thing is it? Why throw an exception and force² your user to handle it when you could move on silently and leave no harm done?

      I do think that the module's clients should have access to the error if desired. For example, an object can set an error condition and make details accessible via an error() method. Providing that, along with the is_start_tag() method, could give Ovid's module's clients the ability to ask "should I do this?" as well as "did I do what I expected?" And, really, that should be enough.

      I understand that from an OO purist perspective, what I'm suggesting is blasphemous. It's lackadaisical, slackish, and lacking the rigor that OO is supposed to enforce. But hey, if you really wanted that, you could shackle yourself to Java or C++ or what-have-you.

      I don't want to give the wrong impression. I'm all for rigor, particularly in large projects. But, for a general module which is likely to be used in far more small projects than large ones, I'm not convinced enforcing rigor is the right way to go.

      1. I really hate using eval and checking $@. I think it's ugly. But that's a personal issue.

      2. And this is the real reason I hate using die() for error handling. It requires that the module's clients use eval, even in cases where it doesn't matter. If you provide another way of handling those cases, then you are in the precarious position of maintaining two separate error handling methodologies, so it isn't a realistic option.

      -sauoq
      "My two cents aren't worth a dime.";
      
        ... I really hate using eval and checking $@. I think it's ugly. But that's a personal issue.
        The user doesn't have to deal with it -- that's why you have the is_ methods, so the user doesn't try foolish things. I'm not an OO purist or whatever, I just dislike daft interfaces.


        MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
        I run a Win32 PPM repository for perl 5.6x+5.8x. I take requests.
        ** The Third rule of perl club is a statement of fact: pod is sexy.

      So start tags and end tags are different things. Then why is there a $token that can be either? You should not get to that stage. From where do you get the tokens? Can you ask for start tags only? If so, you do not have to choose between keeping the condition, or silently doing a no-op.

      If you must visit hierarchies of objects of different types, use Visitor.

Re: OO style question: how much information to hide?
by sauoq (Abbot) on Jun 11, 2003 at 00:49 UTC

    I much prefer the explicit version.

    I would be inclined to code it like that even if the module didn't require me to do so. If nothing else, it would avoid useless method calls.

    I don't think it would bother me if the module didn't enforce that though. It seems to be in the spirit of giving enough rope... and I might be glad of it when needing a quickie throwaway script or a one-liner.

    -sauoq
    "My two cents aren't worth a dime.";
    
Re: OO style question: how much information to hide?
by ViceRaid (Chaplain) on Jun 11, 2003 at 10:01 UTC

    Thanks, I've enjoyed this thread, and also my first taste of HTML::TokeParser::Simple's interface.

    As a more general OO problem or pattern, I'd perhaps be interested in looking at the class which supplies the iterator method (H::TP::S), rather than the class of the objects over which it iterates (H::TP::S::Token). When I'm using a method that'll return some or all members of a larger collection, I'd like that method to guarantee what sort of objects will be returned. That way, I can safely call methods on the individual objects returned by the iterator which not all objects in the whole collection would have:

    package HTML::TokeParser::Simple; sub get_start_token { my $self = shift; my $token; do { $token = $self->get_token; } until ( ! $token || $token->is_start_tag ); return $token; }

    That said, this isn't that practically useful for a stream of objects, as in HTML::TP, where you want to act once and immediately and finally on every iterated object, rather than modifying a collection, then using a different iterator to do something else. The existing interface works like a treat.

    ViceRaid

Re: OO style question: how much information to hide?
by shotgunefx (Parson) on Jun 11, 2003 at 09:25 UTC
    In the HTML parsers I've written, I've always went loose for several reasons.
    1. I've seen a lot of malformed HTML.
    2. I see a lot of odd "HTMLish" tags embedded for processing / templating
    3. I've done 2. myself.
    Personally, I would just make the "strictness" a method you could call so you can have it both ways (carp or croak). The other thing I've done in the parsers I've written is to allow tags to be specified 'tag' and '/tag' as well so the problem can be circumvented all together. The later fits my thinking well.

    -Lee

    "To be civilized is to deny one's nature."
Re: OO style question: how much information to hide? (allow simple code)
by tye (Sage) on Jun 11, 2003 at 15:29 UTC

    I'd say that being able to write simpler code is a win. So:

    • Don't die nor warn (by default) on "deleting" a non-existant tag (even if there is some reason that it should be "impossible" for the tag to exist, such as the token being an end tag)
    • Have the "delete" method return what was deleted (just like delete does)
    • Have "add attribute" complain for end tags
    So I could certainly see value in having an optional complaint emitted if you try to delete attributes from end tags (for those who want to detect if they are being "sloppy" in this respect as a check on the validity of their code). But I don't think that should be the default (in a module with "Simple" in the name).

                    - tye
Re: OO style question: how much information to hide?
by jplindstrom (Monsignor) on Jun 11, 2003 at 17:40 UTC
    What is the semantics of del_attr('font') on a start tag when the attr is missing? Does it croak or "fail" silently?

    I.e. does del_attr() guarantee that an attr will be deleted (i.e. the tag modified), or does it guarantee that the attr 'font' will not be in the tag after being run?

    If the semantics is focused on the deletion, it's not appropriate to modify a closing tag if a closing tag can't be modified (by removing attributes).

    If the semantics is focused on the absence of the attribute after running the method, it's appropriate to leave it to the object to know that "hey, there can't be any attributes here anyway, so I'll let it slide".

    Personally, I'd go with the second alternative, since it lays down fewer rules about how to use the class (this method must only be called in this state!) while (meththinks) providing the same functionality.

    From the user's perspective, the important thing is probably that the attribute is gone, not that it was deleted.


    Now, I don't know anything about HTML::TokeParser::Simple, but is it really impossible for closing tags to have attributes? If passed invalid HTML? If so, is it the responsibility of the class to do anything about it?
Re: OO style question: how much information to hide?
by t'mo (Pilgrim) on Jun 11, 2003 at 23:08 UTC

    Concerning "tokens", is the start-tag-ed-ness of a Token inseparable from being a Token? Or is it, rather, an attribute imposed by having a Grammar? (Of course, that speculation may be beyond the scope of a ::Simple module.)

    About "deletion", I suggest contemplating this question: What response does the object implementing the collection of attributes within the Token give if it is asked to delete an attribute that it does not contain? :-)


    ...every application I have ever worked on is a glorified munger...