Probably the worst thing that can happen to a budding technical book author, is to find an error, even if it’s a minor one, in the submitted material just after the deadline for corrections closes. Figures that it should happen to me, eh?
Back in November last year, Adam Machanic asked me to contribute a chapter on spatial data for his book, Expert SQL Server 2005 Development. I have cursed myself for accepting his proposal several times, but I must admit that now all the hard work is done and the book has finally hit the shelves, I am pretty pleased with the end result. (You can check it out here and here, if you wish – the chapter I wrote is chapter 9, aptly titled “Working with Spatial Data”).
However, there is one problem. Just after the deadline for corrections passed, I got an email from Adam, pointing out that Microsoft employees Steven Hemingray and Isaac Kunen had found a subtle error in the logic I use to calculate both the normal and the dynamic bounding box.
In case you already own the book, you should now flip to pages 269 and 284, where I introduce the Bounding Box and the Dynamic Bounding Box. And in case you don’t own the book yet, I suggest that you order a copy now and come back to this blog once you have it, otherwise none of the text below will make any sense to you J.
The error that Steven and Isaac found has to do with how I calculate the boundaries of the bounding box: by moving straight east and straight west of the center of the search area. Since the search area is always a smaller circle than the longitude lines, I expected the eastern and western boundaries of the search area to converge faster than the corresponding longitude lines. As it turns out, this is only true on the equator – everywhere else, the search area boundaries will eventually converge faster, but not immediately. For instance, on the northern hemisphere, the longitude lines are already converging whereas the search boundaries start out straight north. In the illustration below, you should be able to see how the search area goes just outside the area enclosed by the minimum and maximum longitude found by moving straight east and west from the center of the search area. If you don’t see it straight away, try magnifying the illustration. The effect gets more visible as you get nearer to the north and south pole.
The error may not be a very big one – but it’s always possible that a place of interest happens too lie just inside that small slice of the search area that is beyond the calculated minimum and maximum longitude. In that case, the query will not return this place of interest, as is shown by the query below that fails to find Big Sandy, MT.
DECLARE @Lat float,
SELECT @Lat = 47.6742,
@Lon = -122.114799,
@MaxDist = 900;
FROM dbo.GetNeighbors (@Lat, @Lon, @MaxDist, 'km')
WHERE Dist > 894
ORDER BY PlaceName;
SELECT PlaceName, State, dbo.Distance (@Lat, @Lon, Lat, Lon, 'km')
WHERE PlaceName = 'Big Sandy'
AND State = 'MT';
PlaceName State Dist
-------------------- ----- ----------------------
Aberdeen ID 894,485726688792
Clyde Park MT 897,222548722876
Durham CA 897,552778885914
Island Park ID 898,584489958893
Laytonville CA 895,762093416259
Lewisville ID 899,180452244619
Menan ID 898,883781052055
Stanford MT 895,550954777994
-------------------- ----- ----------------------
Big Sandy MT 894,502761011706
The illustration also shows an easy way to solve this problem. As you can see, if you move straight east and straight west from the northernmost (if on the northern hemisphere) or southernmost (on the southern hemisphere) point of the search area, you’ll end up at a longitudes that exceed the minimum and maximum longitude required for the search. This will then result in a bounding box that is bigger than the minimum size required – in fact, quite a lot bigger for search areas near the poles. With this oversized bounding box, we can be sure not to miss any locations in the search area. The downside of this is that we lose part of the speed gain that the bounding box and the dynamic bounding box brought us. But even with this speed loss, the standard and dynamic bounding box algorithms still run circles around the other algorithms described in the chapter. In the attached file, you’ll find updated versions of the T-SQL and CLR routines that implement the standard and dynamic bounding box algorithms.
I am pretty sure that it must be possible to calculate the minimum and maximum longitude for the search area such that the bounding box will be neither too small, nor too big. However, these calculations force me to dust off some of the heavy high school math that I have managed to avoid using for many years now, so it might take some time. For now, I suggest that if you implement the standard or the dynamic bounding box in your applications, use the corrected versions of the routines that I have attached to this post. And watch this space for a future post about a possible better way to correct the error in the original bounding box algorithms and some hard results of performance comparisons.