|Just another Perl shrine|
Decoding unicode entities with HTML::Parserby Sixtease (Friar)
|on Apr 09, 2008 at 07:31 UTC||Need Help??|
Sixtease has asked for the
wisdom of the Perl Monks concerning the following question:
HTML::Parser provides the HTML::Entities::_decode_entities method, which is the lower level peer of HTML::Entities::decode. I use it in my XML::Entities module to do the real work. However, it appears that older versions of HTML::Parser don't handle unicode entities.
# outputs "ř" on 3.56
# outputs ř on 3.35
The changelog for HTML::Parser says that by version 3.39_90, the Unicode entities are always treated for perl 5.8+ and that it is "no longer a compile-time directive". However, I found nothing about a directive in the earlier versions.
So, my question is: How can I make the older versions of HTML::Parser treat unicode entities?
use strict; use warnings; print "Just Another Perl Hacker\n";