Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Encode string to HTML

by gepebril69 (Scribe)
on Nov 01, 2013 at 14:40 UTC ( #1060768=perlquestion: print w/replies, xml ) Need Help??
gepebril69 has asked for the wisdom of the Perl Monks concerning the following question:

Hi there

I'm trying to automate emails and using therefor HTML templates and parse the dynamic data into it. This goes well unless I use characters like , , in some web email programs. It seems I have to translate/convert these special characters HTML entity names

I found HTML::Entities but it not seems to give the result I expect. I only want to convert the special characters, not HTML markup, like <font>. When I run

my $TestStr = ''; print encode_entities($TestStr);
It returns &Atilde;&macr; in stead of &iuml;

Bug in my module, me not understanding the module?

yours sincerely,

Replies are listed 'Best First'.
Re: Encode string to HTML
by Corion (Pope) on Nov 01, 2013 at 14:43 UTC

    You need to (find out and) tell Perl what encoding your i-with-double-dots letter is in. Then you need to Encode::decode it and then pass it to HTML::Entities for output, or as an alternative, tell the mail client in the headers what output encoding your mail uses.

    Likely of help is perlunitut.

Re: Encode string to HTML
by hippo (Monsignor) on Nov 01, 2013 at 14:46 UTC

    You aren't decoding your source string first.

    use strict; use warnings; use HTML::Entities; use Encode; my $TestStr = ''; print encode_entities($TestStr) . "\n"; print encode_entities(decode ('utf-8', $TestStr)) . "\n";

    Have a read of perlunitut if you haven't already. It'll explain the basics.

      decodeing from utf-8 only helps if the source code is actually encoded as UTF-8. This may or may not be the case.

      At least according to Wikipedia, likely encodings are also ISO 8859-3, ISO 8859-9 or Windows-1254, if guessing that &iuml is supposed to depict a Turkish letter.

        Indeed so - it is nigh on impossible to determine the encoding of a document from a single character, so the actual encoding of the source will only be known by gepebril69. UTF-8 seemed a reasonable first guess in this instance and it does produce the desired output for that one character.

        I've checked the template file and it is

        text/html; charset=utf-8

        Now I understand why I had a similar problem in the past with parsing files. Perl don't seem to auto detect this formatting. It will have a logical reason I guess

      Thanks hippo

      That is very much explaining, so in my case when I want to define unsafe characters I have to use a similar methode.

      my $UnsafeChar = ''; print encode_entities(decode ('utf-8', $TestStr), decode ('utf-8', $Un +safeChar)) . "\n";
Re: Encode string to HTML
by Your Mother (Chancellor) on Nov 01, 2013 at 16:31 UTC

    Also, note that your code needs to know its own encoding.

    use strict; use HTML::Entities; { my $TestStr = ''; print encode_entities($TestStr), $/; } { use utf8; my $TestStr = ''; print encode_entities($TestStr), $/; } __END__ &Atilde;&macr; &iuml;
Re: Encode string to HTML
by sundialsvc4 (Abbot) on Nov 01, 2013 at 15:03 UTC


      I had to change this line of code to make my mail look OK on the webbrowsers I've tested

      $Mail{'content-type'} = 'text/html; charset="iso-8859-1"'; #Old, no +t correct $Mail{'content-type'} = 'text/html; charset="utf-8"';

      Thanks all for aiming me to the right direction

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1060768]
Approved by hdb
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2017-05-28 21:56 GMT
Find Nodes?
    Voting Booth?