There's three approaches the authors could have taken.
-
The module could accept strings of bytes. If the string has three byte characters, it will be stored over three bytes. If the string has non-byte characters, garbage will be produced. ("Garbage" happens to be something similar to the UTF-8 encoding of the string due to internal Perl details that have nothing to do with GDBM.) Any GDBM files can be read.
-
The module could accept strings of text, and store it as UTF-8. If the string has three bytes, it will be stored over three to six bytes. If the string has non-byte characters, they will be stored and extracted properly. Only GDBM files whose text fields contains UTF-8 can be read.
-
The module could accept strings of text, but use of two storage formats depending on the contents of the string. Strings would be stored as efficiently as the first option when possible (except for one extra byte per string), and arbitrary text could be stored. Only GDBM files whose text fields contain strings in this format can be read.
The implementers went with the first. It's the only one that allows the module to read any GDBM file, and that's extremely important. That leaves it up to the user to serialise strings of text into strings of bytes by encoding them.
I suppose its constructor could accept an argument specifying an encoding, allowing the user to choose whether he wants the first or second behaviour. I guess the authors didn't consider that, but that's excusable because the module predates Perl's support for strings of non-ASCII text.
Your error is that your string contains the characters
74 6F 64 61 79 2019 73
so one of the characters isn't a byte, yet the module expects bytes.
Note that UTF-8 isn't always produced. Only when garbage (something that isn't a string of bytes) is given.
Literal: "\xC9\x72\x69\x63"
String: C9 72 69 63
Stored: C9 72 69 63
Literal: "\N{LATIN CAPITAL LETTER E WITH ACUTE}ric"
String: C9 72 69 63
Stored: C9 72 69 63
Literal: "\N{LATIN CAPITAL LETTER E WITH ACUTE}ric\N{RIGHT SINGLE QUOT
+ATION MARK}s"
String: C9 72 69 63 2019 73
Stored: C3 89 72 69 63 E2 80 99 73 (with warning)
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|