THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Adam Machanic

Adam Machanic, Boston-based SQL Server developer, shares his experiences with programming, monitoring, and performance tuning SQL Server. And the occasional battle with the query optimizer.

Bitmask Handling, part 2: Bitmask reconstitution

Posting the first part of my series on bitmasks (yes, this is now officially a series) taught me a lot about my readers: You don't care about handling bitmasks in the database. And I respect you for that! I'm overjoyed, as a matter of fact! That article has received the least hits of anything I've posted in this blog to date. So good for you for not clicking.

However, I'm going to continue and take this thing all the way to the bitter end. Why? I'm not sure, but I think it has something to do with my freshman year computer science course, "Data Structures". It was a course taught in C++, and the very first project was to create a series of classes that could support working with very large numbers. Of course, this was all handled via a bunch of bit manipulation, but I didn't know that because slacker that I was, I never went to class and only started the project the night before it was due.

I never finished the project. But out of the kindness of his heart -- or madness in his brain -- my professor let it slide and I passed the class anyway (I started going to class and completed the rest of the projects after that little slip!)

So here I am with this gaping hole in my education. I never finished the project for large numbers. And I'm really rusty on my C++ at the moment, but I know my way around SQL pretty well.

So what's next? I'm thinking we'll start with standard logical operators: AND, OR, NOT. And then move on to right-shift and left-shift. From there, it should be a pretty simple jump to addition, subtraction, multiplication, and division. I think. My other goal -- if I can figure out how to do it -- is to provide a mechanism by which the user will be able to view a decimal representation (in the form of a string) of the number contained in the bitmask. So this isn't really "handling bitmasks" anymore -- it's now become "how to represent really big numbers".

Anyway, on to the meat of this episode (for the 5 people who are probably going to bother reading this)...

We now have a way of taking a bitmask of arbitrary size (up to 4096 bytes, but could be bigger with more rows in the numbers table) and producing a table of integers representing which "bits" are set to 1 in the bitmask. That's thanks to the splitBitmask function.

Integral to finishing this project is a method of going the other way around -- we need to reconstitute bitmasks from a table of integers representing bit positions.

Since this series is a subseries of my series of things not to do in SQL Server, I'm allowed to use all sorts of dirty tricks to accomplish my goal. The trick du jour is aggregate concatenation (don't you love all of these intra-blog links?). I considered looping and tried to think of a truly set-based solution, but nothing beats the fun of letting SQL Server handle the dirty work in a totally undocumented way.

As you may recall, the BitmaskNumbers table defined in the first article contains three columns: A "Number", which represents an individual bit, a "Byte", which represents a collection of 8 bits, and a "BitValue", which represents the decimal value of that bit within the byte.

Assuming that we have a table of integers representing bit positions, we can join to the BitmaskNumbers table to get the associated bytes and bit values for those bits. So for instance, if we were to split the bitmask 0x1001F3 using the splitBitmask function:

 

SELECT
BitmaskNumbers.Number,
BitmaskNumbers.Byte,
BitmaskNumbers.Bitvalue
FROM dbo.splitBitmask(0x1001F3) x
JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number


Number Byte Bitvalue
------ ------ --------
1 1 1
2 1 2
5 1 16
6 1 32
7 1 64
8 1 128
9 2 1
21 3 16

We have three bytes, and the de-aggregated values for the bits in those bytes. Aggregating them is a simple matter of using the SUM() function:

 

SELECT
BitmaskNumbers.Byte,
SUM(BitmaskNumbers.Bitvalue) AS TotalValue
FROM dbo.splitBitmask(0x1001F3) x
JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
GROUP BY BitmaskNumbers.Byte


Byte TotalValue
------ -----------
1 243
2 1
3 16

Of course, that's a decimal representation and we require hexadecimal...

 

SELECT
BitmaskNumbers.Byte,
CONVERT(VARBINARY(1), SUM(BitmaskNumbers.Bitvalue)) AS TotalValue
FROM dbo.splitBitmask(0x1001F3) x
JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
GROUP BY BitmaskNumbers.Byte
ORDER BY BitmaskNumbers.Byte DESC


Byte TotalValue
------ ----------
3 0x10
2 0x01
1 0xF3

... And now you can see our original three bytes: 0x10, 0x01, and 0xF3. Lucky for those of us who want to do weird things with big bitmasks, SQL Server treats binary values like strings when using the + operator:

 

-- 0x10 + 0x01 != 17

-- Instead:

SELECT 0x10 + 0x01 AS Concat


Concat
------
0x1001

... And one other property that will be useful for aggregate concatenation:

 

SELECT 0x00 + 0xFF AS Concat


Concat
------
0x00FF

--That's not very useful; how do we initialize a NULL varbinary variable to an empty value?


SELECT 0x + 0xFF AS Concat


Concat
------
0xFF


--0x it is...

From here, it's a simple step to rebuilding the original bitmask:

 

DECLARE @Bitmask VARBINARY(4096)
SET @Bitmask = 0x

SELECT
@Bitmask = @Bitmask +
CONVERT(VARBINARY(1),
SUM(BitmaskNumbers.Bitvalue)
)
FROM dbo.splitBitmask(0x1001F3) x
JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
GROUP BY BitmaskNumbers.Byte
ORDER BY BitmaskNumbers.Byte DESC

SELECT @Bitmask AS TheBitmask


TheBitmask
------------
0x1001F3

But this bitmask has no blank spots. What if we switch to 0xFF00FF?

TheBitmask
------------
0xFFFF

--That's not good!

Changing the INNER JOIN to a RIGHT JOIN and adding a NULL check solves the problem a bit:

 

DECLARE @Bitmask VARBINARY(4096)
SET @Bitmask = 0x

SELECT
@Bitmask = @Bitmask +
CONVERT(VARBINARY(1),
SUM(CASE
WHEN x.Number IS NULL THEN 0
ELSE BitmaskNumbers.BitValue
END)
)
FROM dbo.splitBitmask(0xFF00FF) x
RIGHT JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
GROUP BY BitmaskNumbers.Byte
ORDER BY BitmaskNumbers.Byte DESC

SELECT @Bitmask AS TheBitmask


TheBitmask
-------------
0x00000000000000000000000000000000000000000000 ... FF00FF

--Long string of zeroes followed by the original bitmask truncated for brevity

To keep things sparse, we should only take as many bytes as we require. We can filter the results post-join, and also take advantage of the index that was created on the Byte column in the first installment:

DECLARE @Bitmask VARBINARY(4096)
SET @Bitmask = 0x

SELECT
@Bitmask = @Bitmask +
CONVERT(VARBINARY(1),
SUM(CASE
WHEN x.Number IS NULL THEN 0
ELSE BitmaskNumbers.BitValue
END)
)
FROM dbo.splitBitmask(0xFF00FF) x
RIGHT JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
WHERE BitmaskNumbers.Byte <=
(SELECT
CASE MAX(Number) % 8
WHEN 0 THEN (MAX(Number) - 1) / 8
ELSE MAX(Number) / 8
END + 1
FROM dbo.splitBitmask(0xFF00FF))
GROUP BY BitmaskNumbers.Byte
ORDER BY BitmaskNumbers.Byte DESC

SELECT @Bitmask AS TheBitmask


TheBitmask
------------
0xFF00FF

--Finally, the correct re-constituted output

A small optimization is using a table variable for the results of splitBitmask so that it will only have to be executed once. So the final pattern I'll present for solving this problem, ready for insertion into a variety of new UDFs, is:

 

DECLARE @Bitmask VARBINARY(4096)
SET @Bitmask = 0x11002200330044

DECLARE @BitsInBitmask TABLE(Number SMALLINT)
INSERT @BitsInBitmask
SELECT Number
FROM dbo.splitBitmask(@Bitmask)

SET @Bitmask = 0x

SELECT @Bitmask = @Bitmask +
CONVERT(VARBINARY(1),
SUM(CASE
WHEN x.Number IS NULL THEN 0
ELSE BitmaskNumbers.BitValue
END)
)
FROM @BitsInBitmask x
RIGHT JOIN BitmaskNumbers ON BitmaskNumbers.Number = x.Number
WHERE BitmaskNumbers.Byte <=
(SELECT
CASE MAX(Number) % 8
WHEN 0 THEN (MAX(Number) - 1) / 8
ELSE MAX(Number) / 8
END + 1
FROM @BitsInBitmask)
GROUP BY BitmaskNumbers.Byte
ORDER BY BitmaskNumbers.Byte DESC

SELECT @Bitmask AS TheBitmask


TheBitmask
------------
0x11002200330044

... And there you have it. Next time I'll investigate how to use this technique to very easily implement binary logical operators. And perhaps I'll disclose more tales about my poor study habits. Or maybe not.


Published Wednesday, July 12, 2006 10:27 PM by Adam Machanic
Filed under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Tim said:

I know you probably don't monitor this anymore, but I wanted to try these articles for "fun".  However, after generating my numbers table exactly the way you did, I am getting completely different results for all the above statements.  As a matter of fact, all the "TotalValue" results come back the exact opposite of what you have:

In your 3rd statement for example, I get:

Byte TotalValue

3 0xF3

2 0x01

1 0x10

What am I doing wrong?

March 5, 2009 8:25 AM
 

Tim said:

Ha!  Nevermind, I didn't read the other post about the fix to the splitBitmask function...

March 5, 2009 9:11 AM
 

Dmytro Anriychenko said:

Thank you for taking your time to put this together. It is a really helpful article, just what I was looking for to refresh my bitmask knowledge! There are still plenty of geeks around willing to spice up development with some cool stuff and avoid putting like 7 binary parameters in a single database function :-)

Than you,

Dmytro

June 10, 2014 5:51 AM

Leave a Comment

(required) 
(required) 
Submit

About Adam Machanic

Adam Machanic is a Boston-based SQL Server developer, writer, and speaker. He focuses on large-scale data warehouse performance and development, and is author of the award-winning SQL Server monitoring stored procedure, sp_WhoIsActive. Adam has written for numerous web sites and magazines, including SQLblog, Simple Talk, Search SQL Server, SQL Server Professional, CoDe, and VSJ. He has also contributed to several books on SQL Server, including "SQL Server 2008 Internals" (Microsoft Press, 2009) and "Expert SQL Server 2005 Development" (Apress, 2007). Adam regularly speaks at conferences and training events on a variety of SQL Server topics. He is a Microsoft Most Valuable Professional (MVP) for SQL Server, a Microsoft Certified IT Professional (MCITP), and an alumnus of the INETA North American Speakers Bureau.

This Blog

Syndication

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement