I would split on /\r\n?/ instead. That avoids removing blank lines.
But not on a Mac. On a Mac, the meaning of "\n" and "\r" got reversed. "\n" is what you use as native end-of-line characters, remember? And on a Mac, that's chr(13).
Also, as people tend to forget to upload their HTML as text, you often get sequences of two CR characters and one LF. You want to deal with that, too. So here's my solution:
/\015\015?\012|\015|\012/
which you might want to replace with "\n" using s///g, instead of
splitting on it, so you get one cleaned up string, to feed into
HTML::Parser or similar.