|XP is just a number|
I'll second this. Your code is still just code -- it only does what it's told. Write your tests to define what you want it to do -- and that includes handling unexpected stuff. The hard part is actually defining what you want it to do, not writing tests.
At the granular level, much of your code converts a string of input to a string of output. Set up a test loop that does that and define your inputs and outputs in a data structure. You don't need to worry about the web parts yet -- test at a simpler level. E.g.
Once you've got a framework like that in place, just keep adding tests for all the basic, most granular elements. Then I'd suggest considering more complicated cases in similar groups -- e.g. nesting of items inside each other, ordering of nested tags, missing whitespace, etc. (A good time for a separate test file.) Once you have a category, it should be easier to imagine lots of variations. The key is to start small -- focus on small, simple combinations before working up to larger units.
You might also try thinking about the problem in terms of a state-tree. As your code receives each chunk of input, it enters certain states. Are you testing each of the states that different chunks of input lead it to?
Then get malicious. Intentionally try to break your code or make it give improper results -- which is when you may want to use Test::Exception to see if how errors are handled, too. You know your code better than any random end-user so you should be much more likely to generate malignant input than the "million monkeys" that might bang on your code later. (You'd be surprised how often ideas for this will occur to you during the simple tests of expected behavior.)
The point of all of this is to focus on exploring the desired behaviors of your code, both when input is as expected and when it's not. If you do that well at a granular level, then any sort of "real world" input is muchlikely going to fit some pattern you've already tested.
Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.