Realistically, how could that be “a bug report?” Think about it ... every multi-byte character encoding scheme that has ever been invented (or that could be) involves significant-bytes that precede the data that they modify. If you are reading the file from stern to stem, well, “either you read them or you didn’t.”
It stands to reason, therefore, that you must be the one to have read “a few more bytes than you need,” and, having read those bytes, you have to figure out whether (unlucky you ...) you started reading smack-dab in the middle of a multi-byte (MBCS) sequence or not. There is no bright-line rule answer for this. The only reliable strategy that I can think of is to rely upon some contextual knowledge about the data stream itself. Find some string of (non-MBCS) sequence that you know will occur somewhere within the last n characters of the data. Then, read some n+x (for some x...) bytes from the tail of the file, then use a regex to search within that data for that reliable sequence. Advance suspiciously forward from there.
Bear in mind that the onus is upon your application, not merely to come up with the right answers if it can, but to reliably fail if it cannot. Your application is the only player with the capability to do this. The fact that the algorithm does “produce answers at all” must, itself, be a positive indication that those answers are in fact worthy to be trusted.