in reply to Distinguish between HTML and Plain text
Impossible. At best, you can take a guess. But you can guess very reliably because HTML must have an HTML element.
If you don't know if it's text or HTML, then you're surely dealing with bytes, so you need to handle UTF-16le, UTF-16be, UCS-2le, UCS-2be, UCS-4le, UCS-4be:
/<HTML|<\0H\0T\0M\0L|<\0\0\0H\0\0\0T\0\0\0M\0\0\0L/
If you're somehow dealing with decoded text:
/<HTML/
Update: No, that's still not good enough. A text version of this very post would fail, for example.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Distinguish between HTML and Plain text
by vit (Friar) on Sep 26, 2011 at 23:26 UTC | |
by ikegami (Patriarch) on Sep 26, 2011 at 23:36 UTC |
In Section
Seekers of Perl Wisdom