Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^7: Geo Package files

by soonix (Canon)
on Mar 05, 2022 at 18:07 UTC ( [id://11141858]=note: print w/replies, xml ) Need Help??


in reply to Re^6: Geo Package files
in thread Geo Package files

The WKB spec says it's an "8-byte IEEE double". I assumed that would imply "f" format specifier, but probably I was mistaken, and it should be "d", as this older post suggests. Data::IEEE754::Tools mentions "d>"…

The "Floating point numbers" section in perlpacktut says "There is no such thing as a network representation for reals", which might mean that the "little-endian"-flag bit could be ineffective for these doubles.

Replies are listed 'Best First'.
Re^8: Geo Package files
by Bod (Parson) on Mar 06, 2022 at 13:59 UTC
    I assumed that would imply "f" format specifier, but probably I was mistaken, and it should be "d"

    Unfortunately, it doesn't seem to be as simple as that. Either that or I am totally misunderstanding the results I am getting

    Using this code:

    use DBD::SQLite; use Data::Dumper; use strict; use warnings; my $dbh = DBI->connect("dbi:SQLite:uri=file:osopenusrn_202203.gpkg?mod +e=rwc"); my $tab = $dbh->prepare("SELECT * FROM openUSRN"); $tab->execute; my $data = $tab->fetchrow_hashref; my @test = unpack "(H2)2 CCVd6CVV", $data->{'geometry'}; foreach my $t(@test) { print "$t\n"; } print "\n-----\n\n"; printf ('%f ', $test[4]); printf ('%f', $test[6]);
    I get this output
    47 | - G 50 | - P 0 - version 5 - flags (Little Endian & envelope type 2) 27700 - SRS (OSGB 1936) 637590.385 | - min_x -| 642426.601 | - max_x | 309577.58 | - min_y |- envelope 310361.391000001 | - max_y | 0 | - min_z | 0 | - max_z -| 1 - StandardGeoPackageBinary encoding (Little En +dian) 1005 - StandardGeoPackageBinary type (MultiLineStr +ing Z) 34 - Number of LineStrings ----- 27700.000000 642426.601000

    The StandardGeoPackageBinary Spec (section 8.2.2) tell us that there are only three data types. A single byte plus the following:

    An Unsigned Integer is a 32-bit (4-byte) data type that encodes a nonnegative integer in the range (0, 4,294,967,295).
    A Double is a 64-bit (8-byte) double precision datatype that encodes a double precision number using the IEEE 754 double precision format.

    Applying a simple sanity check to the results I get, show that byte 'C' makes sense - e.g. 'GP' and the 'flags'. Also the unsigned int 'V' makes sense. 1005 is a perfectly sensible result for the StandardGeoPackageBinary type.

    The problem comes with the IEEE 754 double. Data::IEEE754 says that: If you can require Perl 5.10 or greater then this module is pointless. Just use the d> and f> pack formats instead!. I am using Perl 5.32.1 so 'd' should give me an IEEE 754 double. It pulls the right number of bytes off the binary (otherwise the following values would be gobbledegook) but it returns values that don't make sense.

    The OBGB SRS doesn't use latitude and longitude. Instead, it uses easting and northing (x, y) coordinates. They have to be integers in the range 0-670000 for x and 0-140000 for y.

    I'm beginning to think that I must be interpreting the data wrongly.
    Is there any other way that a Little Endian IEEE 754 double can be decoded?
    Is it somehow platform dependent? Although surely the purpose of IEEE 754 is to make data universal and remove the platform dependency (once the endiness is established).

      Can you clarify what looks wrong?

      It appears you have decoded the header information correctly, or at least it makes sense. What remains is to decode the component linestrings, of which there are 34 in this multipart geometry.

      A useful debugging process would be to cross-check using a GIS package like QGIS. This will allow you to select and plot each feature and see if your numbers are what are expected.

        This will allow you to select and plot each feature and see if your numbers are what are expected

        The problem was not with unpacking the data - it was with my understanding!!!

        I had erroneously thought that the envelope would be the bounds of the UK. Whereas, it is only the bounds for a single route! The decimal places in the coordinates still confuse me but rounding them gives sufficiently high accuracy.

        My test code looks like this:

        use DBD::SQLite; use Data::Dumper; use strict; use warnings; my $dbh = DBI->connect("dbi:SQLite:uri=file:osopenusrn_202203.gpkg?mod +e=rwc"); my $tab = $dbh->prepare("SELECT * FROM openUSRN LIMIT 1"); $tab->execute; my $data = $tab->fetchrow_hashref; my $geo = $data->{'geometry'}; my @test = unpack "(H2)2 CCVd6CVVCVV(d<)6CVV(d<)9", $data->{'geometry' +}; print "\n\nUSRN\t\t" . $data->{'usrn'} . "\n"; print "\nMagic bytes\t$test[0] $test[1]\n"; print "version\t\t$test[2]\n"; print "flags\t\t"; printf("%08b\n", $test[3]); print "SRS ID\t\t$test[4]\n\n"; print "minx\t\t$test[5]\n"; print "maxx\t\t$test[6]\n"; print "miny\t\t$test[7]\n"; print "maxy\t\t$test[8]\n"; print "minz\t\t$test[9]\n"; print "maxz\t\t$test[10]\n"; print "-----\n"; print "NDR\t\t$test[11]\n"; print "type\t\t$test[12]\n"; print "number\t\t$test[13]\n\n"; print "byteOrder\t$test[14]\n"; print "type\t\t$test[15]\n"; print "number\t\t$test[16]\n\n"; #print "byteOrder\t$test[17]\n"; #print "type\t\t$test[18]\n"; print "Point1 x\t$test[17]\n"; print "Point1 y\t$test[18]\n"; print "Point1 z\t$test[19]\n"; print "Point2 x\t$test[20]\n"; print "Point2 y\t$test[21]\n"; print "Point2 z\t$test[22]\n\n"; print "byteOrder\t$test[23]\n"; print "type\t\t$test[24]\n"; print "number\t\t$test[25]\n\n"; print "Point1 x\t$test[26]\n"; print "Point1 y\t$test[27]\n"; print "Point1 z\t$test[28]\n"; print "Point2 x\t$test[29]\n"; print "Point2 y\t$test[30]\n"; print "Point2 z\t$test[31]\n"; print "Point3 x\t$test[32]\n"; print "Point3 y\t$test[33]\n"; print "Point3 z\t$test[34]\n";
        This shows clearly what is happening and has allowed me to pick the data structure apart a bit at a time. The output I get is:
        USRN 4601764 Magic bytes 47 50 version 0 flags 00000101 SRS ID 27700 minx 637590.385 maxx 642426.601 miny 309577.58 maxy 310361.391000001 minz 0 maxz 0 ----- NDR 1 type 1005 number 34 byteOrder 1 type 1002 number 2 Point1 x 640921.438 Point1 y 310210.692 Point1 z 0 Point2 x 641634.2632 Point2 y 309910.6438 Point2 z 0 byteOrder 1 type 1002 number 3 Point1 x 637969.3747 Point1 y 309910.6438 Point1 z 0 Point2 x 638226.29 Point2 y 309978.609999999 Point2 z 0 Point3 x 638404.534 Point3 y 310054.294 Point3 z 0

        I've sanity checked the data against online tools. First I looked up the USRN of the first entry in the database. I then plotted the 5 points I got from the first 2 of the 34 LineStrings. All 5 points are on the route shown by the USRN so I take that a success :)

        Of course, I still need to generalise the reading of the different geometry data types but at least their structure makes some sense.

        The first step is to get the midpoint of each USRN. Just taking the centre of the envelope should provide this and won't require too much delving into the geometries - although that will have to follow in time.

        Lots of thanks for your excellent pointers.

        A useful debugging process would be to cross-check using a GIS package...

        Yes - thanks, that is probably the next step. I was planning to check the data against a USRN finder but your suggestion is perhaps better.

        I'm not sure this will help with the envelope so I might need to press on and decode some of the geometry data before I can property sanity check it.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11141858]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-19 19:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found