If you change the security configuration of a linked server from using login UserA to login UserB, would you expect that to significantly impact the performance of a query that uses the linked server, assuming that both users have permission to select from the remote table referenced in the query?
Many people probably would answer no. After all, as long as the connection has the permission to access the remote table, how the connection is authenticated should not matter, right? Well, it’s more complicated than that.
Let’s say you have the following the query:
-- select @@servername returns NYCSQL01
SELECT o.*
FROM orders o inner join NYCSQL02.sales.dbo.order_details od
on o.OrderID =od.OrderID
WHERE o.OrderDate = '20090717'
In the above query, NYCSQL01 is the SQL Server instance where the query is being executed and NYCSQL02 is a linked server pointing to a separate SQL Server instance.
To process this query, SQL Server optimizer on NYCSQL01 may choose between two strategies (among others). First, it may decide to retrieve one row at a time from the table order_details on NYCSQL02 for the rows that match the OrderID values found on NYCSQL01 for the given OrderDate value. The second strategy is to retrieve all the rows of order_details from NYCSQL02 and then perform the join locally on NYCSQL01. Depending on the data distribution of the rows in order_details, either approach can be efficient.
To make the correct decision as to which strategy to use, NYCSQL01 needs the distribution statistics for table order_details from NYCSQL02. So far, so good. The question is what permission NYCSQL01 needs in order to retrieve the distribution statistics from NYCSQL02.
It turns out that it needs the same permission as is required to execute DBCC SHOW_STATISTICS. Per Books Online, the permissions for DBCC SHOW_STATISTICS are:
User must own the table, or be a member of the sysadmin fixed server role, the db_owner fixed database role, or the db_ddladmin fixed database role.
This is where the problem lies. You may think that as long as you can access the table order_details on NYCSQL02, you are all set with the query above. After all, if you just want to retrieve some data from the order_details table, it would not be a security best practice to be given a sysadmin, db_owner, or even db_ddladmin role.
If the connection to NYCSQL02 is not the table owner, or a sysadmin, db_owner, or db_ddladmin, you’ll still get the result back. But the query optimizer on NYCSQL01 will not have the stats on order_details, and may just decide to do a table scan on order_details. If order_details turns out to be very large table, the performance of the above query is shot, or worse, it could drag NYCSQL01 down with it.
Whether it is reasonable to require sysadmin, db_owner, or db_ddladmin in order to process a distributed query efficiently is an issue for a different thread of discussion.
My gripe is that the exact permission requirements for efficiently processing a distributed query, especially a distributed join, is not well documented, if it is documented at all. Given its enormous risk, I’d argue that this should not only be documented, but rather it should be highlighted in bold in SQL Server Books Online.
If you agree, please go to Microsoft Connect and vote for the following item:
https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=476001