Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Unpack mysql binary column data

by prashanthch (Novice)
on Apr 22, 2013 at 22:46 UTC ( #1029992=perlquestion: print w/replies, xml ) Need Help??
prashanthch has asked for the wisdom of the Perl Monks concerning the following question:


I am trying to read data stored in a MySQL table, the column is defined as binary (8). Data can be double or uint64. To convert this data to double, I am using unpack ("d", $data) and for uint64, I am using unpack ("Q", $data).

With uint64, I appear to be getting right results for smaller numbers (3, 40, etc) but not for larger numbers. The other problem is that with double, the results do not appear to match.

Am I doing this right or should I be looking at some other way to convert other than unpack?

Thanks for your help.

Replies are listed 'Best First'.
Re: Unpack mysql binary column data
by graff (Chancellor) on Apr 23, 2013 at 01:41 UTC
    How much do you know about (and do you have access to) the process that puts these kinds of values into the database?

    Also, I'm curious: are you saying that there is a single "binary(8)" column that is being used to store doubles in some rows and uint64 in other rows? If so, is there some other column in each row that tells you which data type is being stored there?

    Since you seem to have some other way of knowing what "unpacked" value you should get for a given row, maybe it would help to look at the "raw" bits for that known field value (e.g. as a 16-digit hex numeric string). For uint64 values, that should clear up any problem involving endianness. (Working out the bit fields for doubles will be a little trickier.)

      Thanks for your reply

      Unfortunately, I do not have access to the underlying routines that insert data into the database, I tried to get access but was denied, I am still trying

      yes, there is a single binary(8) column that is used to store doubles for some rows and uint64 for others, another table in the database lists the type of data, I look up the type and run unpack appropriately

Re: Unpack mysql binary column data
by igelkott (Priest) on Apr 22, 2013 at 23:18 UTC
    right results for smaller numbers

    Getting right results till about 255? If so, check the discussion on "endian" in pack. Briefly, look at the < or > modifiers. Integers should be easy to deal with but floats can be problematic.

Re: Unpack mysql binary column data
by choroba (Bishop) on Apr 22, 2013 at 22:51 UTC
    Can you give some examples of the data, ideally both the expected and wrong values?
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Thanks for responding

      Expected value for example is 369790944 but unpack returns 46223868

      Expected value for double 518819.748032 but unpack returns 64852.468503656

      I compared endianness between the system that is storing the values and the system where I read the data using this perl -V:byteorder and it matches

      if it helps, the definition of the column in MySQL is binary(8)

        Expected value for example is 369790944 but unpack returns 46223868

        Taking a different look at those values might be instructive, given that they should have some sort of binary relation when treated as uint64 values. Here's a one-liner that will read the integer values typed (actually, pasted) into STDIN, and spit out the corresponding bit sequence for the given value:

        perl -lpe '$b=sprintf("%b",$_); $n=length($b); $_="$b ($n bits)"' 46223868 10110000010101000111111100 (26 bits) 369790944 10110000010101000111111100000 (29 bits) ^C
        Who could have guessed that the value actually returned by your unpack usage was identical to the value expected, except that it's missing the 3 lowest bits! What could be going on in your script (or in its interaction with the database) that might be causing this?

        The double values involve some extra work (and I'm not sure I'm doing this the "right" way to suit your purposes...)

        perl -lne '$d=pack("d",$_);$b=unpack("b*",$d);printf("%s (%d bits)\n", +$b,length($b))' 518819.748032 1000001110011000001111111011111101110001010101011111100010000010 (64 b +its) 64852.468503656 0100100010000110110111111011111101110001010101011111011100000010 (64 b +its) ^C
        Hmm... let's try that another way...
        perl -e '@ins=(518819.748032,64852.468503656); for (0,1) { push @outs,unpack("b*",pack("d",$ins[$_])) } $_ = ($outs[0] ^ $outs[1])|"0"x64; print join("\n","",@outs,$_,"")' 1000001110011000001111111011111101110001010101011111100010000010 0100100010000110110111111011111101110001010101011111011100000010 1100101100011110111000000000000000000000000000000000111110000000
        The first line of 64 bits is the "expected" value, the second is the value returned by your unpack, and the third has a "1" wherever the previous two don't match. I'm not sure how informative that is, actually, but it's curious to see the amount (and position) of agreement between the expected and returned values.

        Still, as I said, there's a serious chance that this particular pursuit of the bit patterns in doubles is misguided - e.g. I may be using a different notion of "double" from the one being used to populate those binary fields.

        UPDATE: Regarding the doubles, please note that adding an extra decimal place of precision to either (or both) the expected and returned values that you quoted will affect the position and amount of (dis)agreement between the two corresponding bit strings. You should look at the actual bit string as stored in the database, and see how many significant digits it takes to render it accurately as a decimal number (assuming you know how to interpret the binary(8) string correctly, of course).

        ANOTHER UPDATE: For that matter, removing significant digits has an effect as well. Just for grins, I rounded off the "returned" value to 64852.468504 (removing the 3 least significant digits), and with that, the three binary strings (expected, returned, diff) came out like this:

        1000001110011000001111111011111101110001010101011111100010000010 1000001110011000001111111011111101110001010101011111011100000010 0000000000000000000000000000000000000000000000000000111110000000
        Now, you want to see something really curious? Recall that your two uint64 values would have matched perfectly if the "returned" value had simply been shifted up 3 bits -- i.e. multiply by 8. Well:
        perl -e '@ins=( 518819.748032, 64852.468504*8 ); # less "precise" ret +urn value * 8! for(0,1){ push @outs,unpack("b*",pack("d",$ins[$_])) } $_ = ($outs[0] ^ $outs[1])|"0"x64; print join("\n","",@outs,$_,"")' 1000001110011000001111111011111101110001010101011111100010000010 1000001110011000001111111011111101110001010101011111100010000010 0000000000000000000000000000000000000000000000000000000000000000
        If you ever succeed in coming up with an explanation for this, you will have surpassed me. Have fun with that.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1029992]
Approved by igelkott
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2018-06-19 05:54 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (111 votes). Check out past polls.