> when I go to get the first character (...) I suddenly need to specify the encoding
UTF-8 is a multi-byte encoding. It means that some characters, Æ being one of them, are encoded by more than one byte (in this case, two bytes: 0xC3 0x86). If a string starts with such a character, but Perl doesn't know the encoding, it assumes Latin-1, which is a single byte encoding. First character then corresponds to the first byte only, which is 0xC3. It doesn't have any meaning in UTF-8, so it's transformed into �, the replacement character.