Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

There's three approaches the authors could have taken.

  • The module could accept strings of bytes. If the string has three byte characters, it will be stored over three bytes. If the string has non-byte characters, garbage will be produced. ("Garbage" happens to be something similar to the UTF-8 encoding of the string due to internal Perl details that have nothing to do with GDBM.) Any GDBM files can be read.

  • The module could accept strings of text, and store it as UTF-8. If the string has three bytes, it will be stored over three to six bytes. If the string has non-byte characters, they will be stored and extracted properly. Only GDBM files whose text fields contains UTF-8 can be read.

  • The module could accept strings of text, but use of two storage formats depending on the contents of the string. Strings would be stored as efficiently as the first option when possible (except for one extra byte per string), and arbitrary text could be stored. Only GDBM files whose text fields contain strings in this format can be read.

The implementers went with the first. It's the only one that allows the module to read any GDBM file, and that's extremely important. That leaves it up to the user to serialise strings of text into strings of bytes by encoding them.

I suppose its constructor could accept an argument specifying an encoding, allowing the user to choose whether he wants the first or second behaviour. I guess the authors didn't consider that, but that's excusable because the module predates Perl's support for strings of non-ASCII text.


Your error is that your string contains the characters

74 6F 64 61 79 2019 73

so one of the characters isn't a byte, yet the module expects bytes.


Note that UTF-8 isn't always produced. Only when garbage (something that isn't a string of bytes) is given.

Literal: "\xC9\x72\x69\x63" String: C9 72 69 63 Stored: C9 72 69 63 Literal: "\N{LATIN CAPITAL LETTER E WITH ACUTE}ric" String: C9 72 69 63 Stored: C9 72 69 63 Literal: "\N{LATIN CAPITAL LETTER E WITH ACUTE}ric\N{RIGHT SINGLE QUOT +ATION MARK}s" String: C9 72 69 63 2019 73 Stored: C3 89 72 69 63 E2 80 99 73 (with warning)

In reply to Re: hash tied to GDBM_FILE causes Wide character in null operation by ikegami
in thread hash tied to GDBM_FILE causes Wide character in null operation by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (3)
As of 2024-04-25 16:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found