Re: How do I know what encoding was used for form input?

Replies are listed 'Best First'.
Re^2: How do I know what encoding was used for form input? by jhourcle (Prior) on Aug 11, 2005 at 13:11 UTC
Although I'll second the suggestion of using the 'Accept-charset' header, I'm not so sure about user agents responding in the same encoding as the page From RFC 2616 (HTTP/1.0): Read more... (4 kB) I'm still not sure how to handle form data in the QUERY_STRING -- from section 2.1 of RFC 2396 (URI Syntax): Read more... (1056 Bytes) (If anyone knows of a followup RFC, I'd love to know what the number is) And for the original poster, although Joel's article is a good start, it's intended as a quick overview -- I'd also suggest you take a look at A tutorial on character code issues	[reply] [d/l] [select]
Re^3: How do I know what encoding was used for form input? by itub (Priest) on Aug 11, 2005 at 14:57 UTC
That's right, there's no real standard way of telling a client how the URI for a GET should be encoded (and even if there is for POST, it seems most clients don't comply). However, practical experience with mainstream browsers lead to this conclusion (http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html). By now (2005) the robust way to deal with this issue is to send out forms pages encoded in utf-8, expecting the forms input to be submitted back using that encoding. This has been in practical use for a couple of years now (e.g at Google) and can be expected to work with any current HTML4-compatible browser. However, there are other browsers still in use which don't fit this description, so it still seems relevant to look at the theory and compare it with observations. I've used this approach for several websites and it works with all the (reasonably recent) browsers I've tested. "In theory, theory and practice are the same, but in practice, they never are."	[reply]
Re^4: How do I know what encoding was used for form input? by jhourcle (Prior) on Aug 11, 2005 at 16:01 UTC
Thanks for the reference -- I know sgifford had given it as well, but he seemed to just be quoting it, rather than mentioning the information it contained. I hadn't seen the 'buzzword' concept presented before, but it seems like a simple hack to validate what's being sent back to you.	[reply]


Think about Loose Coupling
	PerlMonks