Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

replacement for deprecated encoding pragma

by igoryonya (Pilgrim)
on Jul 19, 2021 at 03:12 UTC ( [id://11135151]=perlquestion: print w/replies, xml ) Need Help??

igoryonya has asked for the wisdom of the Perl Monks concerning the following question:

I used in my programs:
use utf8; use encoding 'utf8'; my $out = &refPrint(\%rows);
Earlier, when encoding was working, unicode text from $out was displayed correctly on terminal and on the web.
Now, it dies with an error, that encoding is deprecated
I tried to just use:
use utf8; my $out = &refPrint(\%rows);
I have some Cyrillic text generated in subroutine &refPrint, that returns the result with the html in it, like such:
<table border='1'><tr><td style='vertical-align: top; '>6</td><td><tab +le border='1'><tr><td style='vertical-align: top; '>table_name</td><t +d>reshenia</td></tr> <tr><td style='vertical-align: top; '>description</td><td>&#1056;&#107 +7;&#1096;&#1077;&#1085;&#1080;&#1103;</td></tr> </table></td></tr> <tr><td style='vertical-align: top; '>1</td><td><table border='1'><tr> +<td style='vertical-align: top; '>description</td><td>&#1057;&#1090;& +#1088;&#1091;&#1082;&#1090;&#1091;&#1088;&#1072; &#1090;&#1072;&#1073 +;&#1083;&#1080;&#1094; &#1073;&#1072;&#1079;&#1099; &#1076;&#1072;&#1 +085;&#1085;&#1099;&#1093;</td></tr> <tr><td style='vertical-align: top; '>table_name</td><td>db_structure_ +tables</td></tr> </table></td></tr> <tr><td style='vertical-align: top; '>3</td><td><table border='1'><tr> +<td style='vertical-align: top; '>table_name</td><td>sotrudniki</td>< +/tr> <tr><td style='vertical-align: top; '>description</td><td>&#1057;&#108 +6;&#1090;&#1088;&#1091;&#1076;&#1085;&#1080;&#1082;&#1080;</td></tr> </table></td></tr> <tr><td style='vertical-align: top; '>2</td><td><table border='1'><tr> +<td style='vertical-align: top; '>description</td><td>&#1057;&#1090;& +#1088;&#1091;&#1082;&#1090;&#1091;&#1088;&#1072; &#1087;&#1086;&#1083 +;&#1077;&#1081; &#1074; &#1090;&#1072;&#1073;&#1083;&#1080;&#1094;&#1 +072;&#1093; &#1073;&#1072;&#1079;&#1099; &#1076;&#1072;&#1085;&#1085; +&#1099;&#1093;</td></tr> <tr><td style='vertical-align: top; '>table_name</td><td>db_structure_ +fields</td></tr> </table></td></tr> <tr><td style='vertical-align: top; '>5</td><td><table border='1'><tr> +<td style='vertical-align: top; '>description</td><td>&#1056;&#1072;& +#1081;&#1086;&#1085;&#1099;</td></tr> <tr><td style='vertical-align: top; '>table_name</td><td>rayony</td></ +tr> </table></td></tr> <tr><td style='vertical-align: top; '>4</td><td><table border='1'><tr> +<td style='vertical-align: top; '>description</td><td>&#1047;&#1072;& +#1087;&#1088;&#1086;&#1089;&#1099;</td></tr> <tr><td style='vertical-align: top; '>table_name</td><td>zaprosy</td>< +/tr> </table>
and, if I print it on the terminal, it displays correctly, as in the above example, but if I pass it to the mojolicious template as a placeholder variable, it scrambles, as it i not a utf8, but just a regular bytes, even when I put <meta charset='utf-8' /> tag in html header:
<html><head> <meta charset="utf-8"> </head> <body> &#1054;&#1090;&#1082;&#1088;&#1099;&#1074;&#1072;&#1077;&#1090;&#1089; +&#1103; &#1041;&#1072;&#1079;&#1072;. <table border="1"><tbody><tr><td style="vertical-align: top; ">2</td>< +td><table border="1"><tbody><tr><td style="vertical-align: top; ">tab +le_name</td><td>db_structure_fields</td></tr> <tr><td style="vertical-align: top; ">description</td><td>С&#130;&# +128;&#131;к&#130;&#131;&#128;а полей в &#130;абли&# +134;а&#133; баз&#139; данн&#139;&#133;</td></tr> </tbody></table></td></tr> <tr><td style="vertical-align: top; ">5</td><td><table border="1"><tbo +dy><tr><td style="vertical-align: top; ">table_name</td><td>rayony</t +d></tr> <tr><td style="vertical-align: top; ">description</td><td>&nbsp;ай +н&#139;</td></tr> </tbody></table></td></tr> <tr><td style="vertical-align: top; ">4</td><td><table border="1"><tbo +dy><tr><td style="vertical-align: top; ">table_name</td><td>zaprosy</ +td></tr> <tr><td style="vertical-align: top; ">description</td><td>&#151;ап +&#128;ос&#139;</td></tr> </tbody></table></td></tr> <tr><td style="vertical-align: top; ">1</td><td><table border="1"><tbo +dy><tr><td style="vertical-align: top; ">table_name</td><td>db_struct +ure_tables</td></tr> <tr><td style="vertical-align: top; ">description</td><td>С&#130;&# +128;&#131;к&#130;&#131;&#128;а &#130;абли&#134; баз&# +139; данн&#139;&#133;</td></tr> </tbody></table></td></tr> <tr><td style="vertical-align: top; ">3</td><td><table border="1"><tbo +dy><tr><td style="vertical-align: top; ">description</td><td>Со&#1 +30;&#128;&#131;дники</td></tr> <tr><td style="vertical-align: top; ">table_name</td><td>sotrudniki</t +d></tr> </tbody></table></td></tr> <tr><td style="vertical-align: top; ">6</td><td><table border="1"><tbo +dy><tr><td style="vertical-align: top; ">description</td><td>&nbsp; +&#136;ения</td></tr> <tr><td style="vertical-align: top; ">table_name</td><td>reshenia</td> +</tr> </tbody></table></td></tr> </tbody></table> </body></html>
When I was using encoding pragma, it was working correctly. I tried to copy the output from the terminal and assign it as a string to a variable, then send that variable to the mojolicious template, and it displays correctly. That made me think, that after generating the text in the subroutine, for some reason, it doesn't think, that it's utf8. So, I tried this:
use utf8; use Encode; my $out = decode('UTF-8', &refPrint(\%rows));
This way mojolicious shows the correctly encoded text, but how can I fix this, so that I don't have to explicitly decode the strings, as in the example with the decode function above? How to make it automatic, as it was with the encoding pragma?

Replies are listed 'Best First'.
Re: replacement for deprecated encoding pragma
by LanX (Saint) on Jul 19, 2021 at 13:44 UTC
    please edit your post and put your sample HTML code inside <code> tags, not <pre> tags.

    Otherwise it's destroying the whole formatting of this thread.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      I did it in pre, because it's not a part of the source code, but the generated result, as it should look.
      I tried to changed it to code and became gibberish, not what I intended.
      So, I changed it back to pre, as the purpose of for it to be displayed, as it looks on the result output, but not show the internal code of it.
        Well obviously was your table not finished and you have some illegal tags too.

        It's pretty hard to read and destroys the formatting.

        Deleting your post is another option to consider...

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        PS: The less readable a post is the less answers it'll get

      PerlMonks is assumed by the browsers to be using the Latin-1 encoding. Anything entered into text entry fields but not fitting into Latin-1 is encoded into HTML entities. They work in the <pre> tags, but <code> tags are processed specially not to let anything HTML-looking be interpreted as HTML, so they are double-encoded on output. The result is that
      привет
      works, but &#1087;&#1088;&#1080;&#1074;&#1077;&#1090; doesn't. So it goes.
Re: replacement for deprecated encoding pragma
by ikegami (Patriarch) on Jul 22, 2021 at 04:13 UTC

    How to make it automatic, as it was with the encoding pragma?

    uh, use encoding didn't do anything to the result of functions calls.

    Anyway, refPrint returns strings encoded UTF-8, which it probably shouldn't. Either fix it, or wrap it. And that's assuming the problem is in refPrint and not the data you pass to it. You haven't told us anything!

    Seeking work! You can reach me at ikegami@adaelis.com

Re: replacement for deprecated encoding pragma
by Anonymous Monk on Jul 19, 2021 at 06:11 UTC
    Sorry, encoding of a string is a thing that has to be made explicit, and refPrint returns byte strings with no encoding information instead of Unicode strings containing wide characters that Mojolicious seems to be expecting. Are you allowed to change refPrint to return wide strings? Write a wrapper like sub refPrintW { decode utf8 => refPrint(@_) }?
      I suspect that you have it backwards. The function refPrint returns a string of html encoded as utf-8. The template expects a normal perl character string. (The function decode decodes the utf-8 into the required perl character string.) I do not understand why, but seems that STDOUT must print the utf-8 characters to the terminal without any further encoding.

      If you really can salvage your legacy software this easily, why not do it? Be sure to document the need for it and how you found it.

      Bill

        I am confused about this encode/decode business. I thought, that decode, converts regular characters to utf8, because I thought, that, for some reason, &refPrint doesn't pass the string in utf8, although, I thought it did, but, if I understood correctly from what you say, the &refPrint returns utf8 and decode converts utf8 to perl character string and mojolicious expects a perl character string and not utf8. That's why the terminal shows it correctly, but the mojolicious result shows scrambled, if not decoded?

        In other words, decode converts specified character code to perl character string?
      So, what you are saying, there is no replacement for an automatic "use encoding..." pragma whatsoever.
      Perl developers deprecated it and replaced it with nothing?

        It is has not been replaced by nothing. What has been taken away, without replacement, is the feature to write use encoding 'ISO-8859-5'; and then use cyrillic characters in that encoding in character literals of your source code. But you didn't do that, you declared UTF-8, and there is a replacement for that: There is use utf8; which is the equivalent of use encoding 'utf8';

        Note that (precisely: Since Perl 5.8.2) neither of those affects how your program reads and writes text: They are used to declare that your source code is encoded as UTF-8. So it mostly affects string literals in your code, but not templates or anything else your program reads or prints to.

        If you want to set a default encoding, have a look at open. If you write use open ':encoding(UTF-8)'; then every calls to open within the lexical scope of the open pragma will be UTF-8 encoded.

        Not at all. There's surely a nice way of doing of fixing the bug that you fixed by relying on another bug (i.e. by using use encoding). But you didn't give us any information about the problem. So refPrint doesn't return the right output. And?

        Seeking work! You can reach me at ikegami@adaelis.com

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11135151]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2024-04-19 17:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found