In response to Adam's new series of T-SQL Tuesday, I wanted to write that there are faster ways to get the data in a set-based manner without resorting to SQLCLR.
http://sqlblog.com/blogs/adam_machanic/archive/2009/12/08/t-sql-tuesday-001-exploring-fuzzy-interval-islands-using-sqlclr.aspx
Many companies today still think of SQLCLR's as security risks, so I wanted to rewrite Adam's set-based code to a more efficient algorithm.
On my computer, Adam's code run in about 25 seconds. These two new algorithms below, runs in less than a second each. Which one to choose is how you interpret the definition Adam wrote. Also, this is not a blog post to make someone feel bad, it is a blog post to show there are other ways to write code, with better performance. You just have to think a little different. If someone is interested in how this algorithm works, please ask nice and I'll probably write a more in-depth blog post about the algorithm.
Complaining in the comments doesn't work.
Take 1:
DECLARE @Interval INT = 7
;WITH cteSource
AS (
SELECT ProductID,
TransactionDate,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY TransactionDate) AS recID
FROM Production.TransactionHistory
), cteMatch
AS (
SELECT s1.ProductID,
s1.TransactionDate AS FromDate,
s2.TransactionDate AS ToDate
FROM cteSource AS s1
INNER JOIN cteSource AS s2 ON s2.ProductID = s1.ProductID
WHERE s1.recID = s2.recID - 1
), cteYak
AS (
SELECT ProductID,
FromDate,
ToDate,
y - ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate) AS grp
FROM (
SELECT ProductID,
FromDate,
ToDate,
CASE
WHEN DATEDIFF(DAY, FromDate, ToDate) <= @Interval
THEN ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate)
ELSE NULL
END AS y
FROM cteMatch
) AS d
WHERE y IS NOT NULL
)
SELECT ProductID,
MIN(FromDate) AS FromDate,
MAX(ToDate) AS ToDate
FROM cteYak
WHERE DATEDIFF(DAY, FromDate, ToDate) <= @Interval
GROUP BY ProductID,
grp
ORDER BY ProductID,
MIN(FromDate)
This runs in less than a second.
Take2:
DECLARE @Interval INT = 7
;WITH cteSource
AS (
SELECT ProductID,
TransactionDate,
ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY TransactionDate) AS recID
FROM Production.TransactionHistory
), cteMatch
AS (
SELECT s1.ProductID,
s1.TransactionDate AS FromDate,
s2.TransactionDate AS ToDate
FROM cteSource AS s1
INNER JOIN cteSource AS s2 ON s2.ProductID = s1.ProductID
WHERE s1.recID = s2.recID - 1
), cteYak
AS (
SELECT ProductID,
FromDate,
ToDate,
CASE
WHEN DATEDIFF(DAY, FromDate, ToDate) <= @Interval THEN 1
ELSE -1
END * ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate) AS recID
FROM cteMatch
), cteFinal
AS (
SELECT ProductID,
MIN(FromDate) AS StartDate,
DATEADD(DAY, @Interval, MAX(ToDate)) AS EndDate,
1 + MAX(recID) AS recID
FROM (
SELECT ProductID,
FromDate,
ToDate,
recID - ROW_NUMBER() OVER (PARTITION BY ProductID ORDER BY FromDate) AS grp,
recID
FROM cteYak
WHERE recID > 0
) AS d
GROUP BY ProductID,
grp
UNION ALL
SELECT ProductID,
FromDate AS StartDate,
DATEADD(DAY, @Interval, FromDate) AS EndDate,
ABS(recID) AS recID
FROM cteYak
WHERE recID < 0
)
SELECT ProductID,
MIN(StartDate) AS StartDate,
MAX(EndDate) AS EndDate
FROM cteFinal
GROUP BY ProductID,
recID
This also runs in less than a second. And produces the exact same result as Adam's code.
//Peter