THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Enjoy Another Sandwich -- Kent Tegels

Yummy slices of SQL Server between slices of .NET and XML

When is a bug not a bug?

I guess when the developers decide it is not. Consider the following query:
drop table dbo.shapeParts
go
create table dbo.shapeParts(
     shapePartID tinyint not null identity(1,1) primary key,
     shapeID tinyint not null,
     shapeQuad tinyint null,
     shape geometry not null);
go
insert into dbo.shapeParts values
(1,0,geometry::STGeomFromText('POLYGON((0 1,1 1,1 2,0 2,0 1))',1));
insert into dbo.shapeParts values
(1,1,geometry::STGeomFromText('POLYGON((1 1,2 1,2 2,1 2,1 1))',1));
insert into dbo.shapeParts values
(1,2,geometry::STGeomFromText('POLYGON((0 0,1 0,1 1,0 1,0 0))',1));
insert into dbo.shapeParts values
(1,3,geometry::STGeomFromText('POLYGON((1 0,2 0,2 1,1 1,1 0))',1));
go
create spatial index shapeindex on dbo.shapeParts(shape)
using geometry_grid with (
     bounding_box =(0, 0, 2, 2),
     grids =(level_1 = low,level_2 = low,level_3 = low,level_4 = low),
     cells_per_object = 1)
go
declare @g2 geometry = geometry::STGeomFromText('POINT(1 1)',1);
select top(1) *
from dbo.shapeParts with(index(shapeIndex))
where (shape.STContains(@g2)=1)
or (shape.STIntersects(@g2))=1
go
The Books Online topic "Geography Methods Supported by Spatial Indexes" reads:
Under certain conditions, spatial indexes support the following set-oriented geography methods: STIntersects(), STEquals(), and STDistance(). To be supported by a spatial index, these methods must be used within the WHERE clause of a query, and they must occur within a predicate of the following general form: geography1 . method_name ( geography2 ) comparison_operator valid_number To return a non-null result, geography1 and geography2 must have the same Spatial Reference Identifier (SRID). Otherwise, the method returns NULL.
. What's in question here is what does "under certain conditions" means. The seems simple enough and for other data types, the type of operation preformed with multiple filter predicates is not an issue, but if you actually try to execute this query in RC0, you get an execution time error:
Msg 8635, Level 16, State 10, Line 2
The query processor could not produce a query plan for a query with a spatial index hint. Reason: Could not find required comparison predicate. Try removing the index hints or removing SET FORCEPLAN.
This seemed like a bug to me, so I filed it on connect (see https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=357578). The response?
"Thank you for the bug submission. Spatial indexes match only WHERE/ON clauses made of single atomic predicates or ANDs thereof -- not ORs. We also do not infer that STIntersects here is redundant."
The response that came back was authoritative and well-reasoned. Yet it also smacks of "its not an error because we say it isn't." Hopefully this response (and kind of response) will be reconsidered in the future.
Published Saturday, July 26, 2008 7:04 AM by ktegels
Filed under: ,

Comments

 

ktegels said:

Sorry Adam, I accidently deleted your comment. Geometery instances can use STContains (see Geometry Methods Supported by Spatial Indexes). And here's an example of OR using an indexing. For my machine (Dell Precision 4300 dual core running Vista x64 and RC0), the "no index" plan has three operation and tree cost of 20.03807. Creating non-clustered indexes on c1 and c3 changed the operation count to 14 but lowered the plan cost to 0.1804321. Finally, there was almost no difference between the no index and unique clustered solution (3 operations, 20.0388 cost) other than the operation: table scan vs index scan.

The real telling part of this is the actual execution times:

runType avExecTime

------- ----------------------

1       303.2

2       0.8

3       293

Adam, I've sent you the script and output for your use. If anybody else is interested, please feel free to contact me.

July 27, 2008 8:05 PM
 

Adam Machanic said:

Kent,

What you've simulated is index intersection, which is useful in only a very small number of cases where you happen to be selecting data from an index with very high cardinality.  In practice it's very rare for me to see intersection used by the QP, and with good reason: if the involved seeks are not both extremely selective, the intersection will end up being much more expensive than a straight scan.  I don't know much about what kind of statistics are kept for spatial data, but I would not be surprised if it's impossible to pre-evaluate how selective one of the predicates will be.  In that case, defaulting to a scan would be a better choice than trying to do blind intersection.  So again, this behavior makes perfect sense.

July 28, 2008 9:25 AM
 

Steven Hemingray said:

Did you see that this bug was converted to a DCR / Suggestion to be considered for the next release? The original comment was simply stating that this was not a defect in Katmai - the spatial index does not support matching OR clauses.  

The comment also should have said that STContains was redundant instead, as in this case all rows will have already been matched by STIntersects so the OR is not even needed.

July 28, 2008 2:14 PM
 

Adam Machanic said:

Steven: What kind of statistics does the engine keep for spatial data?  Can it produce good cardinality estimates about intersection queries?

July 28, 2008 5:48 PM
 

ktegels said:

Hello Adam and Steven, let's tackle this in parts:

AM: What you've simulated is index intersection, which is useful in only a very small number of cases where you happen to be selecting data from an index with very high cardinality.  

The point is, you said wouldn't happen at all. Yet the optimizer is clearly smart enough to know to use the indexes under just such cases.

AM: In practice it's very rare for me to see intersection used by the QP,

I think you said that correctly.. its rare *for you.*

AM: and with good reason: if the involved seeks are not both extremely selective, the intersection will end up being much more expensive than a straight scan.  

Selectivity is only part of it, it also do the amount of data involved. Try samples I send with 100, 1000 and 10,000 rows. The optimizer uses table scan in virtually all of the cases. I think you'd have a hard time saying the selectivity is much different in those dataset... at least on average.

AM: I don't know much about what kind of statistics are kept for spatial data, but I would not be surprised if it's impossible to pre-evaluate how selective one of the predicates will be.  

It all depends, of course, on what is actually in the internal table underpinning the index. I think we will find that out eventually.

AM: In that case, defaulting to a scan would be a better choice than trying to do blind intersection.  So again, this behavior makes perfect sense.

Not at all. There many cases where the optimizer will choose a less expensive plan that doesn't use the indexes but have very long run times. That's why we frequently have to hint the indexes for spatial queries to get acceptable response it. You look at a hinted plan, you'll commonly see that the index is marked as not having statistics.

If this simply a performance optimization left until RTM or if its a permanent thing, we'll have to wait and see.

SH: Did you see that this bug was converted to a DCR / Suggestion to be considered for the next release?

Yes, I was the guy that filled the "bug."

SH: The original comment was simply stating that this was not a defect in Katmai - the spatial index does not support matching OR clauses.

I don't doubt the specs are written that way. But nothing public facing yet says that. That's why I brought it up. People need to know that one logical op works but not another when the non-working operator works for other types of indexes as demonstrated here. If that is a designed-in limitation or a documentation limitation (I personally say that its BOL that needs to make it clear), I don't care so much. Sure, I think promoting to a DCR was a good thing, but what is more useful more immediately is covering the limitations precisely in the public facing docs.

SH: The comment also should have said that STContains was redundant instead, as in this case all rows will have already been matched by STIntersects so the OR is not even needed.

Yes, I forgotten about the "extent covering extent" rule, I was thinking shared-point intersection. My bad.

So while we may have solved the need for conjunctive OR in this case, I have wonder if its the last time we will see it... :)

July 29, 2008 5:32 AM
 

Adam Machanic said:

"I think you said that correctly.. its rare *for you.*"

So you're saying that in the real-world projects you work on, you see this behavior quite often?

July 29, 2008 8:38 AM
 

ktegels said:

AM: So you're saying that in the real-world projects you work on, you see this behavior quite often?

A monk once asked Jo Ju, "Does a dog have Buddha-nature?"

Jo Ju answered, "<img src="http://upload.wikimedia.org/wikipedia/commons/thumb/8/8b/%E7%84%A1-still.svg/167px-%E7%84%A1-still.svg.png">"

July 29, 2008 11:52 AM
 

Randy Stimpson said:

This situation illustrates why bug reports should go through a process that includes a verification step. In this step the person who reported the bug is the one to verify that a bug has been fixed or to <i>approve any other type of resolution</i>.

See http://Bug-Tracking-Guidelines.com (3)

August 6, 2008 5:48 PM
 

ktegels said:

Well Randy, that'd be nice but I don't expect to see any software vendor of the mass of MS do it soon. They be awash in bugs that never got any more feedback I bet.

Nice idea though.

August 6, 2008 6:05 PM
New Comments to this post are disabled

About ktegels

Kent Tegels passed away on July 31, 2010. Kent was an Adjunct Professor at Colorado Technical University and a member of the technical staff at PluralSight. He was recognized by Microsoft with Most Valuable Professional (MVP) status in SQL Server for his community involvement with SQL Server and .NET. Kent held Microsoft Certifications in Database Administration and Systems Engineering, and contributed to several books on data access programming and .NET. He was a well known industry speaker, and resided in Sioux Falls, South Dakota.
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement