Making Geography Faster
When the “geography” type was added to PostGIS in version 1.5, I knew that one of the first things people would notice when they moved a process from geometry to geography was that geography was much slower.
The reason is two-fold. Firstly, geometry already has some fancy caching routines to allow caching of object indexes between spatial function calls. And secondly, the basic algorithms of computational geometry on the sphere are just much more expensive than those for the plane.
However, I knew that there was lots of room for improvement, by applying the same caching behaviour to geography that is used in geometry. And fortunately, a client has been using PostGIS geography enough to require the kind of speed-up that a caching system will provide (10x to 20x faster).
The caching system looks at a function call (for example ST_Intersects(A, B)) and determines if either of the arguments is a repeat from the previous call. If it is, it builds an index over the object’s edges, which allows subsequent calculations to go much faster. Then it stores the index, in case the next call also has a repeat argument.
For cartesian geometry, the index of edges is based on bounding boxes around the edges. For spherical geometry, bounding boxes don’t make any sense, but bounding circles work very nicely.
Once you have a bounding circle for every edge, you can build a tree by merging adjacent circles into parents circles.
Continuing to merge each set until you have a single parent bounding circle.
The nice thing about bounding circles is they work the same regardless of where they are on the globe (they don’t care about datelines or poles) and the test for whether they contain a point, or intersect each other is relatively cheap (if the distance between the centers of two bounding circles is greater than the sum of their radii, they don’t intersect).
Using bounding circle trees and a new generic caching system (now the geography cache and the geometry cache use the same underlying cache control, which has cut down the amount of code substantially) we have been able to demonstrate a very substantial performance boost.
For example, calculating ST_Distance between Canada (3316 vertices) and Brazil (410 vertices) in a country boundary table. Using the traditional distance code, which compares every edge in Canada to every edge in Brazil (that’s 1359150 comparisons), takes 11940ms. Using the tree-based distance calculation takes 35ms. That’s 341 times faster. It’s an extreme case, because both objects are very large, but’s it’s indicative of the kinds of wins available with this improvement.
The speed difference for the distance between Germany (126 vertices) and Spain (130 vertices) is 6ms versus 138ms, over 20 times faster. Between Ireland (71 vertices) and Iceland (106 vertices) the difference is 6ms versus 60ms, 10 times faster.
This improvement has been applied to the following functions:
- ST_Distance(geography, geography)
- ST_DWithin(geography, geography, radius)
It could also be added to ST_Intersects(geography, geography) relatively easily.
A cartesian version of this code (using bounding rectangles instead of circles) could also be written, which would dramatically speed up ST_Distance(geometry, geometry) and ST_DWithin(geometry, geometry). The current geometry caching code only works for the boolean predicates (ST_Intersects, ST_Contains, ST_Within) so the geometry distance code could be very substantially sped up, if there is a client demand.
If you’re willing to fund faster distance calculations for the geometry type, get in touch.
Related Posts
Tags: caching, geography, performance, postgis, speed






Hi
I like the bounding circle concept and the consideration of when this is more useful than a bounding box.
There are a few different bounding circles that are interesting.
It took me a while to see/reason how the merging was done in your pictures, so I thought I would right it down once I had figured it out: It seems that you start somewhere with two adjoining edge circles, then take the bisection/symmetry, then use the centre point of this line segment to define the first of the next level up circles.
I see how the reduction of the number of circles happens by choosing an arbitrary start point and direction.
The resulting bounding circle can be different depending on the arbitrary choices. For polygons with the number of corner points being 4, 16, 64, 256, etc. there are only two different bounding circles and so these can be reduced into a single bounding circle.
For most triangles, there are 3 bounding circles depending on the arbitrary choices. Repeatedly merging these always leads to an approximated bounding circle. That is a lot of calculation for a bounding circle, but it is still interesting.
Another interesting bounding circle is one from the centroid to the furthest point.
A bounding box has 4 parts of information. A bounding circle can be 3.
Thank you.
Right, I didn’t explain in detail, but after you build bounding circles for each edge (center is the midpoint of the ends and radius is half the distance between the ends) you create the tree by merging neighbours (pair-wise or n-wise). The most important thing to note is that you don’t have to sort the inputs to build this tree, so you can build the tree in O(n) time! That means that, even without any caching machinery at all, you can reduce the cost of calculating a polygon-polygon intersection or distance from O(n*m) to O(n+m), from quadratic to linear time, by building these trees. The same thing is true of the cartesian variant.
The simpler the input polygon (triangle, square, etc) the more pointless the tree is, because in objects with very few vertices even the lowest tiers of circles tend to heavily overlap, so the tree doesn’t provide very much winnowing power. But as the Brazil/Canada example shows, for the most gnarly of inputs, the trees absolutely crush the problem.
In cartesian space the box still wins over the circle, IMO, since the intersection test of two boxes can be done with a few super cheap comparison operations, while the intersection of circles requires a distance calculation.
[...] OpenGeo-Blog unter “Making Geogrphiy Faster” ist’s heute beschrieben, wie Berechnungen mit dem Geography-Type in der Geschwindigkeit [...]
Waiting for PostGIS 2.1 – Faster Geography…
Depesz has this series he’s had for a while going called Waiting for PostgreSQL … where he showcases the new features coming in the upcoming release of PostgreSQL pretty much as soon as they are committed to the PostgreSQL Git repo. It’s a simple i…