THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Dealing with very large bitmasks

Continuing in my series of things you should probably not do in SQL Server but sometimes have to, I'm going to do a few posts on dealing with very large bitmasks.

Let me first state my utter hatered of bitmasks in databases. I think they're annoying, make the system difficult to understand, and whether or not they violate the First Normal Form (that's up for discussion), using them is just a sign of bad design.

But as I've said in other posts, I realize that short deadlines and tiny budgets are an issue at most shops, and sometimes we just need to shoehorn in a solution real quick (yeah, as if it's not going to last for the next 5+ years?)

In one case in the past, bitmasks were a very convenient solution to a problem I faced with an access control system. But alas, I only had 8 bytes available to me. Only enough for 64 values. And so that solution failed. And the company failed. And many, many tears were shed... If only I'd been able to figure out how to manipulate a bigger bitmask, I might have saved the little children...

I won't go into any more detail on that particular issue since there are still a few pending lawsuits, but suffice it to say that if that situation were happening today, I probably wouldn't use a bitmask anyway. But you might need one -- so here's how you do it:

First we're going to modify the table of numbers that I'm always telling you that you absolutely must have in every single database.

 

SELECT (a.number * 256 + b.number) AS Number,
CASE (a.number * 256 + b.number) % 8
WHEN 0 THEN ((a.number * 256 + b.number) - 1) / 8
ELSE (a.number * 256 + b.number) / 8
END + 1 AS Byte,
POWER(2, CASE (a.number * 256 + b.number) % 8
WHEN 0 THEN 8
ELSE (a.number * 256 + b.number) % 8
END-1) AS BitValue
INTO BitmaskNumbers
FROM
(
SELECT number
FROM master..spt_values
WHERE
type = 'P'
AND number <= 255
) a (Number),
(
SELECT number
FROM master..spt_values
WHERE
type = 'P'
AND number <= 255
) b (Number)
WHERE
(a.number * 256 + b.number) BETWEEN 1 AND 32767
GO

CREATE CLUSTERED INDEX IX_Byte ON BitmaskNumbers (Byte)
GO

This will produce a table with 32767 rows. Each row has a Number, which will represent a bit position in our bitmask, a Byte, which will help to parse the bitmask, and a BitValue, which is the value that the individual bits within each byte represent. Feel my 1960-esque skill!

The brighter bulbs in my audience have now figured out that I'm going to show you how to handle a 4096-byte bitmask -- capable of handling up to 32767 values. Not bad. But if you need more, just put more rows in the BitmaskNumbers table.

So what do you want to do with bitmasks? Most of the queries I've seen involve access control And for those queries, you want to use a logical and and see if it evaluates to a number other than 0. That is, we want to see if both bitmasks we're comparing have any of the same bits set.

Using what little math knowledge I have managed to retain, I conjured up the following, which indicates which bit positions, based on the "number", are filled in a bitmask. For instance:

 

DECLARE@x VARBINARY(4096)
SET @Bitmask = 0x1F0000000000000000000000000000000000000000000000000000000000000000000000123000000000000000000000000001

SELECT Number
FROM BitmaskNumbers
WHERE (SUBSTRING(@Bitmask, Byte, 1) & BitValue) = BitValue
AND Byte <= DATALENGTH(@Bitmask)


Number
----------
1
2
3
4
5
290
293
301
302
401

Fun stuff, no?

The Byte <= DATALENGTH(@x) allows SQL Server to utilize the clustered index on Byte, so that a full table scan doesn't have to happen every single time. Small optimization. I couldn't think of any others. If you can, drop me a line...

Those of you who've read this far are probably yawning and wondering where the access control stuff is... Who cares about chunking up the bitmask into its bit positions? Well, it's simply the first step. Bear with me.

What we need to do is wrap this in a UDF. Then if we had two bitmasks, we could join the bit positions to eliminate those that aren't in common...

 

CREATE FUNCTION dbo.splitBitmask
(
@Bitmask VARBINARY(4096)
)
RETURNS TABLE
AS
RETURN
(
SELECT Number
FROM BitmaskNumbers
WHERE (SUBSTRING(@Bitmask, Byte, 1) & BitValue) = BitValue
AND Byte <= DATALENGTH(@Bitmask)
)
GO

DECLARE @Bitmask1 VARBINARY(4096)
SET @Bitmask1 = 0x1F0000000000000000000000000000000000000000000000000000000000000000000000123000000000000000000000000001

DECLARE @Bitmask2 VARBINARY(4096)
SET @Bitmask2 = 0x0E0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

SELECT x.Number
FROM dbo.splitBitmask(@Bitmask1) x
JOIN dbo.splitBitmask(@Bitmask2) y ON x.Number = y.Number
GO


Number
----------
2
3
4

We now know that these two bitmasks share bits 2, 3, and 4 in common. But for most access control situations, we don't care what bits they share in common -- just that they share some.

 

CREATE FUNCTION dbo.HasAccess
(
@Bitmask1 VARBINARY(4096),
@Bitmask2 VARBINARY(4096)
)
RETURNS BIT
AS
BEGIN
DECLARE @Result BIT

SELECT @Result =
CASE COUNT(*)
WHEN 0 THEN 0
ELSE 1
END
FROM dbo.splitBitmask(@Bitmask1) x
JOIN dbo.splitBitmask(@Bitmask2) y ON x.Number = y.Number

RETURN (@Result)
END
GO

DECLARE @Bitmask1 VARBINARY(4096)
SET @Bitmask1 = 0x1F0000000000000000000000000000000000000000000000000000000000000000000000123000000000000000000000000001

DECLARE @Bitmask2 VARBINARY(4096)
SET @Bitmask2 = 0x0E0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

SELECT dbo.HasAccess(@Bitmask1, @Bitmask2) AS HasAccess
GO


HasAccess
-------------
1

... And that is pretty much it for this installment. We can now determine whether or not two bitmasks have bits in common, and if necessary which bits they share.

Future installments will cover how to manipulate large bitmasks in other ways -- flip specific bits, perform a logical and that produces a bitmask instead of a result set, and perform a logical or that produces a bitmask. All very useful stuff if you need to work with these bitmasks. But now I just need to figure out how to do all of that stuff.

So until next time.... Don't use this technique.


Published Wednesday, July 12, 2006 10:25 PM by Adam Machanic
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Jason said:

I appreciate your binary examples.  One of the things that perplexes me as a previous application developer turned sql developer is that all of your examples would be better handled on the client side.  Some of your code...especially the use of a numbers table...is extremely innovative.  But, in a total solution I don't see why I would use the database to handle any of them.  Don't mean to be harsh...but i could see a carpenter build a nice space shuttle with wood... I just don't see it getting far.

November 28, 2007 10:29 PM
 

Adam Machanic said:

Jason:

As I mention several times in the post, I do not recommend actually using this technique, for several reasons.  That said, it is certainly not hard to think of scenarios where one might want to do these kinds of things in the database.  Think encapsulation.  If -- for some very, very good reason -- you need to use bitmasks in the database, but perhaps don't need to expose the bitmasks to the application, you can and should do this work in the database in order to keep the application and database as loosely coupled as possible.

I am all for handling tasks wherever they are best suited, but I'm an even bigger fan of highly modular designs.  So if I can keep the application ignorant of some hack I've worked up in the database, all the better...

December 4, 2007 10:30 AM
 

Adam Machanic said:

In the article on handling bitmasks I posted the other day, I made a fatal error in the splitBitmask

January 25, 2009 2:46 PM
 

Adam Machanic said:

Posting the first part of my series on bitmasks (yes, this is now officially a series) taught me a lot

January 25, 2009 2:47 PM
 

Adam Machanic said:

It's been longer than I hoped since my last installment on bitmask / big number handling . Life caught

January 25, 2009 2:50 PM
 

Derek said:

Just FYI Adam, I love this article.  I've been here hundreds of times so attribute a lot of traffic to me!

June 17, 2010 3:25 PM
 

Adam Machanic said:

Glad you enjoyed it. You know you can simply print it, frame it, and hang it on the wall, right? :-)

June 19, 2010 7:36 AM
 

David said:

We do use large bitmasks in our database - and until now, we've handled them via SQL CLR, making all processing rather fast. However, with SQL Azure, CLR is no longer an option, at least for now. I tried re-implementing some of our bitmask user-defined functions in SQL basing myself on your wonderful articles and the bitmask reconstruction operation seems to be much slower than its SQL CLR counterpart. Do you think the bad performance is an intrinsic property of the bitmask reconstruction, or is there room for optimization?

December 23, 2011 5:08 PM
 

Adam Machanic said:

Hi David,

It seems natural that a CLR approach would work much better--it's a very CPU-heavy operation, and CLR functions almost always deliver higher performance in those cases. Is it so much slower that it's unbearable?

I originally wrote all of this stuff 7 or 8 years ago, so I'm not going to be able to give you a great answer about improvements without doing some testing. But taking a quick look at it, I'm thinking that yes, there is some room for optimization -- perhaps by using some of the XQuery methods that were added in SQL Server 2005. Are those available in Azure? I've luckily managed to avoid having to use it, to date!

--Adam

December 23, 2011 10:19 PM
 

David said:

The performance degradation is significant even for the binary mask creation. Consider the following:

select Data, fnCreateBinaryMark(40) from SomeTable

Which would create a binary mask of 40 bytes for each row of the table.

For a table containing ~4000 rows, the SQL CLR version runs in less than a second while the purely SQL version takes 6 seconds.

Since the binary mask creation is simply the concatenation if table variable values and  since the re-construction of the varbinary from a table already seems to be a costly operation, I have a feeling I would not be able to build performant bit operation functions, since it's the mandatory final step.

I do believe XQuery is available on Azure, but not sure how it can help - afaik, bit operations are not part of XPath/XQuery.

January 3, 2012 7:10 PM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement