Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Re^2: How to call Encode::decode from Perl XS

by mje (Curate)
on May 09, 2011 at 09:12 UTC ( #903737=note: print w/replies, xml ) Need Help??

in reply to Re: How to call Encode::decode from Perl XS
in thread How to call Encode::decode from Perl XS

The rt reporter was using MS SQL Server. See automatic character encoding handling in perl dbi dbdodbc. However, he is also using DBD::ODBC. I've maintained DBD::ODBC for a lot of years and no one has ever reported this to me but I already do some UTF-8 decoding and thought it would be easy to add. There are other issues moving to DBD::Sybase but I don't want to start an argument here.

  • Comment on Re^2: How to call Encode::decode from Perl XS

Replies are listed 'Best First'.
Re^3: How to call Encode::decode from Perl XS
by SimonClinch (Deacon) on May 09, 2011 at 13:52 UTC
    Well, I am not intending any module bias here. But what occurred to me in fact was that Sybase Open Client reads locales.dat and uses it to pass the client character set at login time to SQL Server. I was wondering whether OBDC, being a database independent architecture with a standardised driver interface was bypassing that Sybase-specific step.

    One world, one people

      It is more difficult to explain this in ODBC. The rt poster was running on Windows (I mention that because things differ slightly on UNIX e.g. locales). In ODBC there is the ANSI API and the Wide API - the latter supports UCS2 - these affect calls like SQLPrepare where you can pass SQL in unicode encoded in UCS2 . Then when binding to a column to fetch data you must name the type of column at bind time and in his case as the column was varchar it was bound as a SQL_CHAR (one byte = one chr). There is no way (in ODBC) to say I'm binding it as SQL_CHAR but can you return it to me in somesort of encoding or character-set (unless perhaps you change your SQL and that is DB dependent). However, if his column had been nvarchar DBD::ODBC would have bound it as SQL_WCHAR which is a wide UCS2 encoded chr and all is well.

      Each database client lib, ODBC driver etc has different ways of defining the local character-set and there is nothing in ODBC to say what it is. In addition, the ODBC specification does not work properly with variable length character encodings e.g. UTF8 as some of the APIs (SQLGetData) use the filling of the provided buffer to indicate something and that could mean chopping a UTF8 encoded chr off part way through.

      Even if you could request the bound column data to be returned in a particular chrset or encoding (and you can as you say in some cases) what is DBD::ODBC to do with it as it has no idea what that chrset or encoding is unless it performs DB-specific SQL on every query to query the possibly per-column chrset.

      If the bound data is not returned UCS2 encoded unicode chrs then DBD::ODBC cannot guess anything and it is up to the Perl script. However, DBD::ODBC already has a flag for the data returned from the db is UTF-8 encoded (for some derivative of Postgres) and it decodes it - so I thought I could combine this and the rt to allow the script to specify an encoding.

      Add to that the fact that the unicode additions to ODBC are not even in THE spec as passed to X/Open - they are a Microsoft thing added afterwards.

        Seems to me that in that case, to use ODBC to access SQL Server, the client user has to set it's locale (e.g. the windows locale) to match that of the SQL server.

        One world, one people

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://903737]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2018-07-16 03:18 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (330 votes). Check out past polls.