|Problems? Is your data what you think it is?|
Re: Perl5 Internal Representation of string variableby repellent (Priest)
|on Oct 03, 2010 at 19:58 UTC||Need Help??|
quotemeta has little to do with how Perl keeps variables internally. You need to separate those concerns to avoid confusion.
quotemeta is used to insert backslashes preceding non-word characters in a string for the purpose of avoiding regex meta-ness if that string were to be used in a regular expression.
In addition, you need to separate the concerns of how Perl keeps variables internally from how you perceive Perl strings.
Treat a Perl string as a string of characters (in the abstract sense, not in the char C sense). How each character is stored internally is a separate issue. How each character is represented (as bytes) when you print them out can be decided based on how they are encoded.
This separation of concerns allows the wonderful use of Unicode. We can rest easy knowing that each character is not limited to 256 or 65536 (or whatever) different types. We treat characters as characters - today, Perl operations like regexp matching work on characters, so do length, substr, etc.
Miss the old-think where strings were composed of just bytes? Then map the new-think of character strings back to where each character can have 256 different types (1 byte per character) and you'll have things back to the old way. Caveat: if you're taking this approach, you won't be able to represent Unicode characters from Latin Extended onwards.
Perl has no concept of NUL-terminated strings. In the example below, when we store a NUL byte as a Perl string, the string is interpreted as having a single NUL character:
This may be of interest: Why Not Translate Perl to C?