http://www.perlmonks.org?node_id=1076650


in reply to Re: write html code with perl
in thread write html code with perl

Hi Ken. Regarding your closing remark about character entities for the non-ASCII characters, I think one rule will suffice: use HTML::Entities. It has an "encode_entities" function that can convert utf-8 characters in a given range to their "symbolic" entity-references:
perl -MHTML::Entities=encode_entities -e 'print encode_entities("\x{00 +bf}Que?")'

(Update: to clarify: if the files to be listed already have quaint names with non-ASCII utf8 characters in them, then adding HTML::Entities::encode_entities will make sure those names will display correctly without having to worry about a given browser's default character-set; but if the string descriptions to be displayed are different from the actual file names used, then there would need to be some sort of look-up table for that.)

Replies are listed 'Best First'.
Re^3: write html code with perl
by kcott (Archbishop) on Mar 01, 2014 at 18:32 UTC
    "Regarding your closing remark about character entities for the non-ASCII characters, I think one rule will suffice: use HTML::Entities. ..."

    Use of HTML::Entities was my first thought too; however, the one line that didn't follow the general pattern seemed so different, that I decided to suggest special rules or manual editing.

    For all lines (except the que_esta_pasando.png one), the "my @name = ..." line seemed to embody the general pattern to the extent possible from the code provided by the OP: trapped.png becomes Trapped, the_look.png becomes The Look, and so on.

    For the que_esta_pasando.png line, the general pattern is not applied beyond splitting on underscores. I think your "Update" alludes to this but I thought I'd point out the differences, as I see them, in this specific case.

    • Split on underscores (as for other lines): "que_esta_pasando.png" becomes "que esta pasando"
    • Know this is a foreign language. [I've assumed Spanish; happy to be corrected; it doesn't change the point I'm making.]
    • Apply the appropriate diacritical marks for this language: "que esta pasando" becomes "qué está pasando"
    • Treat as a sentence, not as a title, so only the first word is capitalised: "qué está pasando" becomes "Qué está pasando"
    • Know enough of this language to understand that this sentence is a question, not a statement. Apply the appropriate punctuation for a question in this language: "Qué está pasando" becomes "¿Qué está pasando?"
    • Not shown, but adding a lang="es" attribute would probably also be appropriate.

    All of this could be coded and, for some massive international image library with ongoing maintenance requirements, may well be the way to go. However, the OP states "I'm working in a basic html page, actually it is just a list of links to some pictures.": in this instance, rules (such as a lookup table which you've suggested) or manual editing seems more sensible and a lot less work.

    -- Ken