Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Encode string to HTML

by gepebril69 (Beadle)
on Nov 01, 2013 at 14:40 UTC ( #1060768=perlquestion: print w/ replies, xml ) Need Help??
gepebril69 has asked for the wisdom of the Perl Monks concerning the following question:

Hi there

I'm trying to automate emails and using therefor HTML templates and parse the dynamic data into it. This goes well unless I use characters like , , in some web email programs. It seems I have to translate/convert these special characters HTML entity names

I found HTML::Entities but it not seems to give the result I expect. I only want to convert the special characters, not HTML markup, like <font>. When I run

my $TestStr = ''; print encode_entities($TestStr);
It returns &Atilde;&macr; in stead of &iuml;

Bug in my module, me not understanding the module?

yours sincerely,

Comment on Encode string to HTML
Select or Download Code
Re: Encode string to HTML
by Corion (Pope) on Nov 01, 2013 at 14:43 UTC

    You need to (find out and) tell Perl what encoding your i-with-double-dots letter is in. Then you need to Encode::decode it and then pass it to HTML::Entities for output, or as an alternative, tell the mail client in the headers what output encoding your mail uses.

    Likely of help is perlunitut.

Re: Encode string to HTML
by hippo (Curate) on Nov 01, 2013 at 14:46 UTC

    You aren't decoding your source string first.

    use strict; use warnings; use HTML::Entities; use Encode; my $TestStr = ''; print encode_entities($TestStr) . "\n"; print encode_entities(decode ('utf-8', $TestStr)) . "\n";

    Have a read of perlunitut if you haven't already. It'll explain the basics.

      decodeing from utf-8 only helps if the source code is actually encoded as UTF-8. This may or may not be the case.

      At least according to Wikipedia, likely encodings are also ISO 8859-3, ISO 8859-9 or Windows-1254, if guessing that &iuml is supposed to depict a Turkish letter.

        Indeed so - it is nigh on impossible to determine the encoding of a document from a single character, so the actual encoding of the source will only be known by gepebril69. UTF-8 seemed a reasonable first guess in this instance and it does produce the desired output for that one character.

        I've checked the template file and it is

        text/html; charset=utf-8

        Now I understand why I had a similar problem in the past with parsing files. Perl don't seem to auto detect this formatting. It will have a logical reason I guess

      Thanks hippo

      That is very much explaining, so in my case when I want to define unsafe characters I have to use a similar methode.

      my $UnsafeChar = ''; print encode_entities(decode ('utf-8', $TestStr), decode ('utf-8', $Un +safeChar)) . "\n";
Re: Encode string to HTML
by sundialsvc4 (Abbot) on Nov 01, 2013 at 15:03 UTC

      Thanks!

      I had to change this line of code to make my mail look OK on the webbrowsers I've tested

      $Mail{'content-type'} = 'text/html; charset="iso-8859-1"'; #Old, no +t correct $Mail{'content-type'} = 'text/html; charset="utf-8"';

      Thanks all for aiming me to the right direction

Re: Encode string to HTML
by Your Mother (Canon) on Nov 01, 2013 at 16:31 UTC

    Also, note that your code needs to know its own encoding.

    use strict; use HTML::Entities; { my $TestStr = ''; print encode_entities($TestStr), $/; } { use utf8; my $TestStr = ''; print encode_entities($TestStr), $/; } __END__ &Atilde;&macr; &iuml;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1060768]
Approved by hdb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (14)
As of 2014-11-27 16:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (186 votes), past polls