In my previous blog post, I wrote about how to calculate median value and weighted median value in a secure and fast approach.
In this blog post I am going to describe how you can calculate a fast moving average and also calculate a fast weighted moving average.
This is the sample data we should work with during the whole exercise. I also display both the normal moving average and the weighted moving average at the same time.
DECLARE @Sample TABLE
VALUES (DATEADD(DAY, DATEDIFF(DAY, 10, GETDATE()), 0), 2),
(DATEADD(DAY, DATEDIFF(DAY, 9, GETDATE()), 0), 2),
(DATEADD(DAY, DATEDIFF(DAY, 8, GETDATE()), 0), 4),
(DATEADD(DAY, DATEDIFF(DAY, 7, GETDATE()), 0), 4),
(DATEADD(DAY, DATEDIFF(DAY, 6, GETDATE()), 0), 5),
(DATEADD(DAY, DATEDIFF(DAY, 5, GETDATE()), 0), 6),
(DATEADD(DAY, DATEDIFF(DAY, 4, GETDATE()), 0), 6),
(DATEADD(DAY, DATEDIFF(DAY, 3, GETDATE()), 0), 8),
(DATEADD(DAY, DATEDIFF(DAY, 2, GETDATE()), 0), 9),
(DATEADD(DAY, DATEDIFF(DAY, 1, GETDATE()), 0), 10),
(DATEADD(DAY, DATEDIFF(DAY, 0, GETDATE()), 0), 11),
(DATEADD(DAY, DATEDIFF(DAY, -1, GETDATE()), 0), 9)
DECLARE @Result TABLE
First of all, I am going to show you an approach made by a cursor. This is actually one of the fastest way to accomplish this task!
It is also very resource friendly and uses (n) combinations to get the results.
-- Declare some variables needed by the CURSOR
DECLARE @Date SMALLDATETIME,
DECLARE curUgly CURSOR FOR SELECT dt,
ORDER BY dt
WHILE @@FETCH_STATUS = 0
SELECT @RateMinusTwo = @RateMinusOne,
@RateMinusOne = @RateCurrent,
@RateCurrent = @Rate
IF @RateMinusTwo IS NOT NULL AND @RateMinusOne IS NOT NULL AND @RateCurrent IS NOT NULL
(@RateCurrent + @RateMinusOne + @RateMinusTwo) / 3,
0.7 * @RateCurrent + 0.2 * @RateMinusOne + 0.1 * @RateMinusTwo
SELECT dt AS [Date],
ma AS NormalMovingAverage,
wma AS WeightedMovingAverage
ORDER BY dt
But, as you can see it is not very configurable. What if you suddenly wants a moving average over 5 days? Well, you simply have to rewrite several parts such as declare, initial select statement, if clause and the insert part. Essentially the whole code.
Now, what if we want to write a SET-based query to do the same thing? The most frequent used query I have seen before, is a self-join query (Cartesian product) like this.
It is very slow since it tries all combinatations (n * n) before filtering out the rows to be used. However it is more configurable friendly.
-- A common SET-based solution
SELECT t1.dt AS [Date],
AVG(t2.Rate) AS NormalMovingAverage,
SUM(CASE DATEDIFF(DAY, t2.dt, t1.dt)
WHEN 0 THEN 0.7 * t2.Rate
WHEN 1 THEN 0.2 * t2.Rate
WHEN 2 THEN 0.1 * t2.Rate
) AS WeightedMovingAverage
FROM @Sample AS t1
INNER JOIN @Sample AS t2 ON DATEDIFF(DAY, t2.dt, t1.dt) BETWEEN 0 AND 2
GROUP BY t1.dt
HAVING COUNT(t1.dt) = 3
ORDER BY t1.dt
Even if the code looks SET-based, it is not in reality. I know Jeff Moden would call this RBAR (Row-By-Agonizing-Row). However, the code is more maintainable and altering from a 3 day to a 5 day moving average will make you have to change the code in 3 places only; the case statement, between value and having clause. Much better, but also absolutely worse performance compared to a cursor.
Surely there must be a way to combine these two approaches? Yes, there is, and the combinations needed to get the result is only (3 * n) which is much less than the other set-based code but also somewhat more than the cursor approach.
See this code.
-- A better SET-based approach
SELECT DATEADD(DAY, d.Number, s.dt) AS [Date],
AVG(s.Rate) AS NormalMovingAverage,
SUM(d.Coefficient * s.Rate) AS WeightedMovingAverage
FROM @Sample AS s
CROSS JOIN (
VALUES (0, 0.7),
) AS d(Number, Coefficient)
GROUP BY DATEADD(DAY, d.Number, s.dt)
HAVING COUNT(*) = 3
ORDER BY DATEADD(DAY, d.Number, s.dt)
Changing this piece of code to a 5 day moving average only takes editing in 2 places; the derived table and the having clause. If you have the coefficients in an auxiliary table, there will be only one edit to be made! In the auxiliary table, because the having part can equal a subquery calculating the number of records in the auxiliary table.
For comparison, I tested all three methods on 5,000 sample rows and got this result
CURSOR 6,813 ms
Common set-based 20,577 ms
Better set-based 127 ms