HTML::Parser is your friend. A subset - HTML::TokeParser - may be sufficient. You are correct, Adam, when you identify the source of the difficulty - they there are a variety of ways to code those HTML tags. (as a side issue - XHTML with a much more rigid syntax will make this easier - sort of like "use strict;" for HTML.)
Now, if this is part of a class project, maybe they are wanting to see how you would tackle the problem, as an exercise in analysis and program design. At least HTML::Parser should be a good source of inspiration.