Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

length() returns wrong result - suspicious magic

by mje (Curate)
on Sep 15, 2010 at 12:22 UTC ( #860211=perlquestion: print w/ replies, xml ) Need Help??
mje has asked for the wisdom of the Perl Monks concerning the following question:

I have some DBI, DBD::ODBC code which pulls unicode data from a table but length() seems to return the wrong result when called on the bound scalar on second and subsequent rows.

use strict; use DBI; use Devel::Peek; use bytes; no bytes; my ($txt_de, $txt_ru); { use utf8; $txt_de = 'Käse'; $txt_ru = '&#1052;&#1086;&#1089;&#1082;&#1074;&#1072;'; } binmode STDOUT, ':utf8'; my @dsn = qw/DBI:ODBC:xxx xx xx/; my %opt = (PrintError => 0, RaiseError => 1, AutoCommit => 1, ChopBlan +ks => 1); my $dbh = DBI->connect( @dsn, \%opt ); $dbh->{LongReadLen} = 4000; $dbh->{LongTruncOk} = 1; my $sth_ins = $dbh->prepare( 'INSERT INTO T2 (a, u, x) VALUES (?, ?, CAST( ? AS XML) )' ); foreach my $row ([$txt_de, $txt_de, "<d>$txt_de</d>"], [$txt_ru, $txt_ru, "<r>$txt_ru</r>"]) { $sth_ins->bind_param(1, $row->[0]); $sth_ins->bind_param(2, $row->[1]); $sth_ins->bind_param(3, $row->[2], {TYPE => -8}); $sth_ins->execute; } #$sth_ins->execute( $txt_de, $txt_de, "<d>$txt_de</d>" ); #$sth_ins->execute( $txt_ru, $txt_ru, "<r>$txt_ru</r>" ); my $sth_sel = $dbh->prepare( 'SELECT u, x FROM T2' ); $sth_sel->execute; $sth_sel->bind_col(1, \my $txt, {TYPE => -8}); $sth_sel->bind_col(2, \my $xml, {TYPE => -8}); #$sth_sel->bind_columns( \my( $txt, $xml ) ); my $i = 0; while ( $sth_sel->fetch ) { printf "%3u %3u %3u %s [%s] [%s]\n", ++$i, length($txt), bytes::length($txt), (utf8::is_utf8($txt) ? ' utf8' : '!utf8'), $txt, $xml; # NOTE, if I don't reset $txt each iteration the length() call retur +ns # the wrong answer. #$txt = ''; this line fixes it # http://code.activestate.com/lists/perl5-porters/153703/ Dump($txt); } $dbh->disconnect;

which outputs

perl -Iblib/lib -Iblib/arch examples/xml2.pl 1 4 5 utf8 [Käse] [<d>Käse</d>] # ^ this 4 and 5 are correct SV = PVMG(0x853b6fc) at 0x856d570 REFCNT = 2 FLAGS = (PADMY,SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x8674698 "K\303\244se"\0 [UTF8 "K\x{e4}se"] CUR = 5 LEN = 8 MAGIC = 0x86746c0 MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 4 2 4 12 utf8 [&#1052;&#1086;&#1089;&#1082;&#1074;&#1072;] [<r>&#1 +052;&#1086;&#1089;&#1082;&#1074;&#1072;</r>] # ^ this 4 is wrong but the 12 is probably right SV = PVMG(0x853b6fc) at 0x856d570 REFCNT = 2 FLAGS = (PADMY,SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x8671098 "\320\234\320\276\321\201\320\272\320\262\320\260"\0 +[UTF8 "\x{41c}\x{43e}\x{441}\x{43a}\x{432}\x{430}"] CUR = 12 LEN = 16 MAGIC = 0x86746c0 MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 4

I don't know about all that magic and I'm suspicious about the MG_LEN = 4.

Any ideas?

UPDATE: perl 5.10.1 and 5.12.2 on Linux/Ubuntu

Comment on length() returns wrong result - suspicious magic
Select or Download Code
Re: length() returns wrong result - suspicious magic
by ikegami (Pope) on Sep 15, 2010 at 14:41 UTC
    Looks like a there's a call to SvSETMAGIC missing in DBI or DBD::ODBC. Just to confirm, does the following cause length to return the correct length?
    $txt=$txt;

    By the way, using constant SQL_WCHAR would be clearer than using -8.

      I though DBI was missing a def for SQL_WCHAR but it turns out it is there. The code started out as code someone submitted to me to look into another issue so I've only modified some parts of it.

      Setting $txt = $txt at the end of the loop makes no difference but setting $txt='' fixes it. If this confirms your suspicion could you explain why DBD::ODBC (which wrote to the scalar) might need to call SvSETMAGIC?

        No, at the start of the loop. After the fetch, but before you use it.

        why DBD::ODBC (which wrote to the scalar) might need to call SvSETMAGIC?

        To assign a value to a scalar, there are a few steps to follow.

        • Make sure the scalar has the slot (IV, PV, etc) you need by upgrading the structure if necessary.
        • Place the value in the appropriate slot.
        • Set the flag indicating there's a usable value in the slot you populated.
        • Call SvSETMAGIC to let magic respond to the new value if appropriate (if SMG=1). This will end up calling STORE for tied variables, for example. In this case, I expect that it will clear the precomputed length of the string.

        To obtain a value from a scalar, the same is done in reverse.

        • Call SvGETMAGIC to populate the scalar with the a value. (This will end up calling FETCH for tied variables, for example.)
        • Coerce the scalar into the requested type if necessary. (This may requiring upgrading the scalar).
        • Return the value in the appropriate slot.

        Some macros and functions do more than one of these steps for you.

        Setting $txt = $txt at the end of the loop makes no difference

        No, at the start of the loop. After the fetch, but before you use it.

        There's no get magic on the scalar (GMG=0), so the fact that the set magic wasn't called earlier work won't matter. But when the value is assigned back to the scalar, the assignment will properly handle the set magic.

        If I'm right, I can provide a cheaper workaround than copying the string (which defies the purpose of binding).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://860211]
Approved by moritz
Front-paged by MidLifeXis
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2014-10-26 10:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (153 votes), past polls