by PodMaster (Abbot)
on Jun 08, 2003 at 11:48 UTC

After all this time, and finally getting closer and closer to releasing XML::TokeParser (one which has this functionality built-in), I finally took another look at this thread and realized I too need to do something like that.


I mean, why would you get_tag and then test to see if it's a tag, or a process instruction, since it can only be a tag.

I quickly fixed this and then I got reminded again that a XML::TokeParser::Token doesn't have a constructor -- yuck.

Then I thought maybe I should force get_tag to return a proper token, but that would break backwards compatiblity, and I sure don't wanna do that.

Then I think to myself I should forget all this nonsense, and have

  • XML::TokeParser::Token::StartTag
  • XML::TokeParser::Token::EndTag
  • XML::TokeParser::Token::PI
  • XML::TokeParser::Token::Comment
  • XML::TokeParser::Token::Text
Might as well take full advantage of blessed references. Something like
package XML::TokeParser::Token; sub is_text { return 0; } sub is_comment { return 0; } sub is_pi { return 0; } sub is_tag { return 0; } sub is_start_tag { return 0; } sub is_end_tag { return 0; } sub raw { return $_[0]->[-1]; } package XML::TokeParser::Token::Text; # use vars::i '@ISA' => 'XML::TokeParser::Token'; # i'll probably put +vars::i on cpan also use vars '@ISA'; @ISA = 'XML::TokeParser::Token'; sub is_text { return 1; } sub text { return $_[0]->[-2]; }
Thoughts/Comments? I think maybe that's what i'll do, because
sub is_end_tag { if( $_[0]->[0] eq 'E' or ( @{$_[0]} == 2 && substr( $_[0]->[0], 0, 1 ) eq '/' ) ){ if(defined $_[1]){ return 1 if $_[0]->[1] eq $_[1]; } else { return 1; } } return 0; }
does not look so hot. *sigh*

on Jun 10, 2003 at 04:58 UTC

    I've been thinking about that for a while. I was considering a few other options that I might want to toss in the code and somehow never quite get around to it. What you propose is a heck of a lot cleaner and will clear up some other issues. I guess I was the bad lazy. I hope you don't mind if I steal your code :)

    Incidentally, if you haven't seen it, HTML::TokeParser::Simple is now at version 2.1 and has three HTML munging methods added that cover some very common situations that people keep wanting to deal with.


