Someone posted a question to the SQL Server forum the other day asking how to count runs of zero bits in an integer using SQL. Basically the poster wanted to know how to efficiently determine the longest contiguous string of zerobits (known as a run of bits) in any given 32bit integer. Here are a couple of examples to demonstrate the idea:
Decimal = Binary = Zero Run
999,999,999 decimal = 00111011 10011010 11001001 11111111 binary = 2 contiguous zero bits
666,666,666 decimal = 00100111 10111100 10000110 10101010 binary = 4 contiguous zero bits
My first reaction was that SQL is not my first choice of a goto language for bit twiddling hacks. These types of calculations are generally most efficient in Cstyle compiled procedural languages with plenty of bit manipulation instructions that map almost directly oneforone to lowlevel machine language instructions. In all fairness, SQL does have some bitlevel operators (&, , etc. operators), but the performance isn’t as optimized as a language like C#.
At any rate, a few different ideas were tossed around, like the simplistic loopandcount procedural method. Using this method you just keep a running total of the longest zerobit run and keep looping over the bits, adjusting your running total, comparing to your largest run of zero bits, and shifting your integer one bit right each time. But this being SQL, I decided to attack the problem from a setbased perspective. To start with I built a 1,000,000 row table to hold random integers:
CREATE TABLE TempNum
(
Num BIGINT NOT NULL
);
GO
WITH GenerateRandom
AS
(
SELECT 1 AS n,
ABS(CONVERT(BIGINT, CONVERT(BINARY, CHECKSUM(NEWID()))) % 4294967296) AS Random
UNION ALL
SELECT n + 1,
ABS(CONVERT(BIGINT, CONVERT(BINARY, CHECKSUM(NEWID()))) % 4294967296)
FROM GenerateRandom
WHERE n < 1000000
)
INSERT INTO TempNum
(
Num
)
SELECT Random
FROM GenerateRandom
OPTION (MAXRECURSION 0);
GO
Note that I used SQL Server’s BIGINT data type (64bit integer) since I wanted to deal only in unsigned 32bit integers. For my initial crack at a solution I split the 32bit integer up into individual bits and treated the whole thing like a classic Gaps and Islands problem. Essentially the 1 bits are islands, the 0's are gaps, and the length of the longest gap is the correct answer.
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
WITH Powers
AS
(
SELECT 0 AS id,
CAST(1 AS BIGINT) AS pwr
UNION ALL
SELECT p.id + 1,
POWER(CAST(2 AS BIGINT), p.id + 1)
FROM Powers p
WHERE p.id < 33
),
Islands
AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY tn.Num ORDER BY id) AS RowNum,
tn.Num,
p.id,
p.pwr
FROM Powers p
CROSS JOIN TempNum tn
WHERE p.pwr & ((tn.Num * 2 )  CAST(8589934593 AS BIGINT)) <> 0
)
SELECT c.Num AS Num,
MAX(n.id  c.id  1) AS ZeroBitRun
INTO #Temp
FROM Islands AS c
INNER JOIN Islands AS n
ON n.RowNum = c.RowNum + 1
AND n.Num = c.Num
WHERE n.id  c.id > 1
GROUP BY c.Num,
n.Num;
GO
One interesting aside on this solution  to ensure that leading and trailing zeroes were counted for every 32bit number I had to shift the number left one bit (multiply by 2) and do a bitwise OR ( operator) with 8589934593, setting both the lowest bit (bit 0) and the highest bit (bit 33) to 1. Basically this means you’re now dealing with a 34bit integer, with the highest and lowest bits counted as islands. This ensures that for the 32 bits in between leading and trailing gaps are correctly identified. On my PC this query took an average of just over 360 seconds (6 minutes) to identify the largest run of zero bits in 1,000,000 random numbers. Another solution posted to the group used logarithms combined with bitshifting in a recursive CTE. This one completed 1,000,000 iterations in about 17.5 minutes.
I couldn't help but think that when you get down to it, a solution to this problem should play to SQL’s strengths. I decided that what I really need for an efficient SQL solution to this problem is a lookup table. Granted, a lookup table of 4+ billion rows (one row for each and every 32 bit number) would take a long time to build and probably wouldn't lend itself to IO efficiencies in SQL Server. So I opted for a scaled down version and built a lookup table of 65,536 rows with one row representing every possible 16 bit number. There are a lot of clever ways to grab bitlevel pattern information (Google "de Bruijn sequence", for instance), but to be honest this lookup table is a onetime build and static population so I decided to keep it simple.
CREATE TABLE #Nybbles
(
Num int primary key not null,
String varchar(4) not null,
Leading tinyint not null,
Trailing tinyint not null
);
GO
INSERT INTO #Nybbles
(
Num,
String,
Leading,
Trailing
)
VALUES (0, '0000', 4, 4), (1, '0001', 3, 0), (2, '0010', 2, 1), (3, '0011', 2, 0),
(4, '0100', 1, 2), (5, '0101', 1, 0), (6, '0110', 1, 1), (7, '0111', 1, 0),
(8, '1000', 0, 3), (9, '1001', 0, 0), (10, '1010', 0, 1), (11, '1011', 0, 0),
(12, '1100', 0, 2), (13, '1101', 0, 0), (14, '1110', 0, 1), (15, '1111', 0, 0);
GO
CREATE TABLE #BitMask
(
Length tinyint,
Mask varchar(16)
);
GO
WITH GetMasks
AS
(
SELECT 0 AS Length,
CAST('' AS varchar(16)) AS Mask
UNION ALL
SELECT Length + 1,
CAST(REPLICATE('0', Length + 1) AS varchar(16))
FROM GetMasks
WHERE Length < 16
)
INSERT INTO #BitMask
(
Length,
Mask
)
SELECT Length,
Mask
FROM GetMasks;
GO
CREATE TABLE BitPattern
(
Num int not null primary key,
Trailing tinyint,
Leading tinyint,
Seq tinyint
);
GO
WITH GetBin
AS
(
SELECT n1.Num * 4096 + n2.Num * 256 + n3.Num * 16 + n4.Num AS Num,
n1.String + n2.String + n3.String + n4.String AS String
FROM #Nybbles n1
CROSS JOIN #Nybbles n2
CROSS JOIN #Nybbles n3
CROSS JOIN #Nybbles n4
)
INSERT INTO BitPattern
(
Num,
Trailing,
Leading,
Seq
)
SELECT g.Num,
(
SELECT MAX(b1.Length)
FROM #BitMask b1
WHERE RIGHT(g.String, b1.Length) = b1.Mask
) AS Trailing,
(
SELECT MAX(b2.Length)
FROM #BitMask b2
WHERE LEFT(g.String, b2.Length) = b2.Mask
) AS Leading,
COALESCE
(
(
SELECT MAX(b3.Length)
FROM #BitMask b3
WHERE CHARINDEX(b3.Mask, g.String) > 0
),
0) AS Seq
FROM GetBin g
ORDER BY Num;
GO
DROP TABLE #BitMask;
DROP TABLE #Nybbles;
GO
The first part of this script puts all 16 combinations of 4bit nybbles (nybble = half a byte) and their equivalent binary formatted strings (0 = '0000', 14 = '1110') into a temp table called #Nybbles.
There’s also a #BitMask temp table with bitmasks representing zerobit runs. The bitmasks are just strings of consecutive '0' characters of the necessary length (length 1 = '0', length 5 = '00000').
The BitPatterns table is the actual 16bit number lookup table. This table is populated by combining every 16bit combination of nybbles from the #Nybbles temp table. This table has 4 columns:

Num is the 16bit number

Trailing is the number of zero bits trailing (on the righthand side) in the number

Leading is the number of zero bits leading (on the lefthand side) in the number

Seq is the longest sequence of zero bits within the number
The total run time to build this lookup table was around 30 seconds on my computer. Keep in mind that’s a onetime cost since you never have to build (or modify) the table again.
With the information in this lookup table the query that locates the longest run of zero bits in any given 32bit number is relatively simple:
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;
GO
WITH CTE
AS
(
SELECT tn.Num,
(
SELECT
CASE WHEN l.Seq > h.Seq THEN
(
CASE WHEN l.Seq > h.Trailing + l.Leading THEN l.Seq
ELSE h.Trailing + l.Leading
END
)
ELSE
(
CASE WHEN h.Seq > h.Trailing + l.Leading THEN h.Seq
ELSE h.Trailing + l.Leading
END
)
END
) AS Seq,
CASE WHEN l.Trailing = 16 THEN h.Trailing + l.Trailing
ELSE l.Trailing
END AS Trailing,
CASE WHEN h.Leading = 16 THEN h.Leading + l.Leading
ELSE h.Leading
END AS Leading
FROM TempNum tn WITH (NOLOCK)
INNER JOIN BitPattern l WITH (NOLOCK)
ON l.Num = (tn.Num & 65535)
INNER JOIN BitPattern h WITH (NOLOCK)
ON h.Num = (tn.Num / 65536)
)
SELECT Num,
Seq,
Leading,
Trailing
INTO #Temp
FROM CTE;
GO
Basically you join the BitPattern lookup table on the high 16 bits and again on the low 16 bits. The first CASE expression in the Seq subquery performs a 3way maximum calculation. It returns the largest of:

the low word zerobit run (l.Seq),

the high word zerobit run (h.Seq), or

the count of high word trailing zerobits + the count of low word leading zerobits (h.Trailing + l.Leading)
The other two CASE expressions return the total number of trailing and leading zerobits in the 32bit number. The CASE expression is needed to handle the case when the high or low word is all zerobits.
This particular solution took an average of about 6 seconds to calculate the longest zerobit run for 1,000,000 numbers.
To my earlier point, I created a C# solution (which itself could have been optimized) that performed the exact same calculation for 1,000,000 random 32 bit integers in 2 seconds flat. So I guess there are two main points here: (1) Make sure you choose the right tool/language for the job, and (2) Whatever tool/language you choose try to play to its strengths.