Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

I have some DBI, DBD::ODBC code which pulls unicode data from a table but length() seems to return the wrong result when called on the bound scalar on second and subsequent rows.

use strict; use DBI; use Devel::Peek; use bytes; no bytes; my ($txt_de, $txt_ru); { use utf8; $txt_de = 'Käse'; $txt_ru = '&#1052;&#1086;&#1089;&#1082;&#1074;&#1072;'; } binmode STDOUT, ':utf8'; my @dsn = qw/DBI:ODBC:xxx xx xx/; my %opt = (PrintError => 0, RaiseError => 1, AutoCommit => 1, ChopBlan +ks => 1); my $dbh = DBI->connect( @dsn, \%opt ); $dbh->{LongReadLen} = 4000; $dbh->{LongTruncOk} = 1; my $sth_ins = $dbh->prepare( 'INSERT INTO T2 (a, u, x) VALUES (?, ?, CAST( ? AS XML) )' ); foreach my $row ([$txt_de, $txt_de, "<d>$txt_de</d>"], [$txt_ru, $txt_ru, "<r>$txt_ru</r>"]) { $sth_ins->bind_param(1, $row->[0]); $sth_ins->bind_param(2, $row->[1]); $sth_ins->bind_param(3, $row->[2], {TYPE => -8}); $sth_ins->execute; } #$sth_ins->execute( $txt_de, $txt_de, "<d>$txt_de</d>" ); #$sth_ins->execute( $txt_ru, $txt_ru, "<r>$txt_ru</r>" ); my $sth_sel = $dbh->prepare( 'SELECT u, x FROM T2' ); $sth_sel->execute; $sth_sel->bind_col(1, \my $txt, {TYPE => -8}); $sth_sel->bind_col(2, \my $xml, {TYPE => -8}); #$sth_sel->bind_columns( \my( $txt, $xml ) ); my $i = 0; while ( $sth_sel->fetch ) { printf "%3u %3u %3u %s [%s] [%s]\n", ++$i, length($txt), bytes::length($txt), (utf8::is_utf8($txt) ? ' utf8' : '!utf8'), $txt, $xml; # NOTE, if I don't reset $txt each iteration the length() call retur +ns # the wrong answer. #$txt = ''; this line fixes it # http://code.activestate.com/lists/perl5-porters/153703/ Dump($txt); } $dbh->disconnect;

which outputs

perl -Iblib/lib -Iblib/arch examples/xml2.pl 1 4 5 utf8 [Käse] [<d>Käse</d>] # ^ this 4 and 5 are correct SV = PVMG(0x853b6fc) at 0x856d570 REFCNT = 2 FLAGS = (PADMY,SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x8674698 "K\303\244se"\0 [UTF8 "K\x{e4}se"] CUR = 5 LEN = 8 MAGIC = 0x86746c0 MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 4 2 4 12 utf8 [&#1052;&#1086;&#1089;&#1082;&#1074;&#1072;] [<r>&#1 +052;&#1086;&#1089;&#1082;&#1074;&#1072;</r>] # ^ this 4 is wrong but the 12 is probably right SV = PVMG(0x853b6fc) at 0x856d570 REFCNT = 2 FLAGS = (PADMY,SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0x8671098 "\320\234\320\276\321\201\320\272\320\262\320\260"\0 +[UTF8 "\x{41c}\x{43e}\x{441}\x{43a}\x{432}\x{430}"] CUR = 12 LEN = 16 MAGIC = 0x86746c0 MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = 4

I don't know about all that magic and I'm suspicious about the MG_LEN = 4.

Any ideas?

UPDATE: perl 5.10.1 and 5.12.2 on Linux/Ubuntu

In reply to length() returns wrong result - suspicious magic by mje

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (8)
    As of 2014-11-26 23:27 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My preferred Perl binaries come from:














      Results (177 votes), past polls