Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

HTML::Entities not encoding @ or .

by punch_card_don (Curate)
on Feb 12, 2008 at 13:56 UTC ( [id://667561]=perlquestion: print w/replies, xml ) Need Help??

punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Meaty Monks,

Follow up to this question - my Perl script uses HTML::Entities to encode form input for sql sanitization on fields like fname, lname, email, address1, etc.

I asked the question above after noticing that the at-sign (@) and the period (.) were not being encoded. I thought maybe these were not in the default list %char2entity.

So I did this (only the pertinent lines shown):

use HTML::Entities; use HTML::Entities qw( %char2entity %entity2char ); #thanks ikegami foreach $val (keys %char2entity) { print "<br>$val => $char2entity{$val}\n"; } $string = "this is an @ AT"; $string_2 = "é â ä à å ç ê ë è ï î ì Ä å É æ Æ ô ö ò û ù ÿ Ö Ü £ ¥ P ƒ + á í ó ú ñ Ñ ª º ¿ ¬ ¬ ½ ¼ ¡ @ . , < > [ ] { } - _ ; :"; print "<p>encoded @ => ".encode_entities('@').", <br>and the string ha +s become => ".encode_entities($string)." <br>and the string_2 has bec +ome => ".encode_entities($string_2)."\n";

This outputs (looking at the source of the html page returned):

... <br>@ => &#64; ... <br>. => &#46; ... encoded @ => @, <br>and the string has become => this is an @ AT <br>and the string_2 has become => &eacute; &acirc; &auml; &agrave; &a +ring; &ccedil; &ecirc; &euml; &egrave; &iuml; &icirc; &igrave; &Auml; + &aring; &Eacute; &aelig; &AElig; &ocirc; &ouml; &ograve; &ucirc; &ug +rave; &yuml; &Ouml; &Uuml; &pound; &yen; P &#131; &aacute; &iacute; & +oacute; &uacute; &ntilde; &Ntilde; &ordf; &ordm; &iquest; &not; &not; + &frac12; &frac14; &iexcl; @ . , &lt; &gt; [ ] { } - _ ; :
And I get the same result running this by telnet to ensure I'm not looking at interpreted output.

The @ and the . are right there in the hash of characters to encode. But they are not encoded. Note that several other characters that I also found in the hash are not encoded, such as [ and ].

What the heck?

Thanks.




Forget that fear of gravity,
Get a little savagery in your life.

Replies are listed 'Best First'.
Re: HTML::Entities not encoding @ or .
by moritz (Cardinal) on Feb 12, 2008 at 14:04 UTC
    @ and . have no special meaning in HTML, and are not escaped by default:
    #!/usr/bin/perl use strict; use warnings; use HTML::Entities qw(encode_entities); my $str = ".@\n"; print encode_entities($str); print encode_entities($str, '<>&".@'); __END__ .@ &#46;&#64;

    As the example shows, you can force HTML::Entities to encode them, if you wish.

    Note that the only chars that need escaping in HTML are <>&, and " in attributes.

Re: HTML::Entities not encoding @ or .
by Joost (Canon) on Feb 12, 2008 at 14:25 UTC
      Yes, I took that advice to heart - using placeholders also. Is there some harm also encoding entities will cause?
        Hmm.. if you're also using place holders or quote, it probably won't matter as far as security goes, but it does make it harder to search the database or interact with the DB using anything but your code (I tend to do quite a lot of inspecting using hand-written SQL during development).

        Oh and it'll take more space to encode everything (which may make certain columns unexpectedly too small if someone enters a character you're escaping).

        So it probably won't cause serious harm, it does IMO make it harder to develop and test. I wouldn't do it.

        Yes, if the output from your database ever needs to be anything other than HTML, you'll need to remember to decode it explicitly at that time. The best rule to follow, I've found, is to keep the raw text in the DB, then encode it appropriately at time of output, for the relevant output format in question.
Re: HTML::Entities not encoding @ or .
by Anonymous Monk on Feb 12, 2008 at 14:05 UTC
    The default set of characters to encode are control chars, high-bit chars, and the <, &, >, and " characters.
    @ and . are not on the list. Try
    encode_entities($a, "\000-\377");
Re: HTML::Entities not encoding @ or .
by punch_card_don (Curate) on Feb 12, 2008 at 14:21 UTC
    ...not escaped by default...

    ...not on the list...

    Hmmm - so why is that when I do:

    foreach $val (keys %char2entity) { print "<br>$val => $char2entity{$val}\n"; }
    This outputs:

    ... <br>@ => &#64; ... <br>. => &#46;
    Is %char2entity not the list of characters that are encoded by default?


    Update:

    Answer to my own question - no, I don't think %char2entity is the list of characters to encode by default.

    If I take Anonymous Monk's suggestion:

    encode($string, \000-\377)
    then every single character in my string gets encoded - without supplying any further information about what are the codes for these characters. In other words, it looks like %char2eneitty is just the list of all char-to-entity relations for reference, NOT the list of chars that shoud be encoded by default.
      Where do you get your information from?
        <humour>A guy in the back alley. I give him the password "monk" and $20 and he spills the beans....</humour>

        No - um, nowhere except in the documentation and the trials described above. The documentation says

        The module can also export the %char2entity and the %entity2char hashes, which contain the mapping from all characters to the corresponding entities (and vice versa, respectively).

        Which I took to mean "all characters that will be encoded by default". Then observed that encode_entities('@') does not encode @. So I wondered if that was because @ was not in the %char2entity hash, working on the assumption that %char2entity is the list of chars to encode by default. Using help from this board, I exported %char2entity and printed it out

        use HTML::Entities; use HTML::Entities qw( %char2entity %entity2char ); #thanks ikegami foreach $val (keys %char2entity) { print "<br>$val => $char2entity{$val}\n"; }
        and found that @ IS in the %char2entity hash. Then trying your suggestion (assuming this is the same Anonymous Monk) of
        encode_entities($a, "\000-\377");
        found that simply telling the module which characters to encode results in them being encoded, even though that command does not supply any new information about code-character mapping. The module, therefore, must already have that information, and it occurs to me that maybe that's the reason not all chars are encoded by default even though %char2entity contains a full set of char-entity relations - becase %char2entity is just a reference hash, NOT the list of chars to be encoded by deafult.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://667561]
Approved by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (4)
As of 2024-03-19 05:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found