THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Peter Larsson

Thinking outside the box

Moving average and Weighted Moving Average

In my previous blog post, I wrote about how to calculate median value and weighted median value in a secure and fast approach.

In this blog post I am going to describe how you can calculate a fast moving average and also calculate a fast weighted moving average.
This is the sample data we should work with during the whole exercise. I also display both the normal moving average and the weighted moving average at the same time.

DECLARE @Sample TABLE
        (
            dt SMALLDATETIME,
            Rate SMALLMONEY
        )

INSERT  @Sample
VALUES  (DATEADD(DAY, DATEDIFF(DAY, 10, GETDATE()), 0),  2),
        (DATEADD(DAY, DATEDIFF(DAY,  9, GETDATE()), 0),  2),
        (DATEADD(DAY, DATEDIFF(DAY,  8, GETDATE()), 0),  4),
        (DATEADD(DAY, DATEDIFF(DAY,  7, GETDATE()), 0),  4),
        (DATEADD(DAY, DATEDIFF(DAY,  6, GETDATE()), 0),  5),
        (DATEADD(DAY, DATEDIFF(DAY,  5, GETDATE()), 0),  6),
        (DATEADD(DAY, DATEDIFF(DAY,  4, GETDATE()), 0),  6),
        (DATEADD(DAY, DATEDIFF(DAY,  3, GETDATE()), 0),  8),
        (DATEADD(DAY, DATEDIFF(DAY,  2, GETDATE()), 0),  9),
        (DATEADD(DAY, DATEDIFF(DAY,  1, GETDATE()), 0), 10),
        (DATEADD(DAY, DATEDIFF(DAY,  0, GETDATE()), 0), 11),
        (DATEADD(DAY, DATEDIFF(DAY, -1, GETDATE()), 0),  9)

DECLARE @Result TABLE
        (
            dt SMALLDATETIME,
            ma SMALLMONEY,
            wma SMALLMONEY
        )


First of all, I am going to show you an approach made by a cursor. This is actually one of the fastest way to accomplish this task!
It is also very resource friendly and uses (n) combinations to get the results.

-- Declare some variables needed by the CURSOR
DECLARE @Date SMALLDATETIME,
        @Rate SMALLMONEY,
        @RateCurrent SMALLMONEY,
        @RateMinusOne SMALLMONEY,
        @RateMinusTwo SMALLMONEY

DECLARE curUgly CURSOR FOR  SELECT      dt,
                                        Rate
                            FROM        @Sample
                            ORDER BY    dt

OPEN    curUgly

FETCH   NEXT
FROM    curUgly
INTO    @Date,
        @Rate

WHILE @@FETCH_STATUS = 0
    BEGIN
        SELECT  @RateMinusTwo = @RateMinusOne,
                @RateMinusOne = @RateCurrent,
                @RateCurrent = @Rate

        IF @RateMinusTwo IS NOT NULL AND @RateMinusOne IS NOT NULL AND @RateCurrent IS NOT NULL
            INSERT  @Result
                    (
                        dt,
                        ma,
                        wma
                    )
            VALUES  (
                        @Date,
                        (@RateCurrent + @RateMinusOne + @RateMinusTwo) / 3,
                        0.7 * @RateCurrent + 0.2 * @RateMinusOne + 0.1 * @RateMinusTwo
                    )

        FETCH   NEXT
        FROM    curUgly
        INTO    @Date,
                @Rate
    END

CLOSE       curUgly
DEALLOCATE  curUgly

SELECT      dt AS [Date],
            ma AS NormalMovingAverage,
            wma AS WeightedMovingAverage
FROM        @Result
ORDER BY    dt


But, as you can see it is not very configurable. What if you suddenly wants a moving average over 5 days? Well, you simply have to rewrite several parts such as declare, initial select statement, if clause and the insert part. Essentially the whole code.

Now, what if we want to write a SET-based query to do the same thing? The most frequent used query I have seen before, is a self-join query (Cartesian product) like this.
It is very slow since it tries all combinatations (n * n) before filtering out the rows to be used. However it is more configurable friendly.

-- A common SET-based solution
SELECT      t1.dt AS [Date],
            AVG(t2.Rate) AS NormalMovingAverage,
            SUM(CASE DATEDIFF(DAY, t2.dt, t1.dt)
                    WHEN 0 THEN 0.7 * t2.Rate
                    WHEN 1 THEN 0.2 * t2.Rate
                    WHEN 2 THEN 0.1 * t2.Rate
                END
            ) AS WeightedMovingAverage
FROM        @Sample AS t1
INNER JOIN  @Sample AS t2 ON DATEDIFF(DAY, t2.dt, t1.dt) BETWEEN 0 AND 2
GROUP BY    t1.dt
HAVING      COUNT(t1.dt) = 3
ORDER BY    t1.dt

Even if the code looks SET-based, it is not in reality. I know Jeff Moden would call this RBAR (Row-By-Agonizing-Row). However, the code is more maintainable and altering from a 3 day to a 5 day moving average will make you have to change the code in 3 places only; the case statement, between value and having clause. Much better, but also absolutely worse performance compared to a cursor.

Surely there must be a way to combine these two approaches? Yes, there is, and the combinations needed to get the result is only (3 * n) which is much less than the other set-based code but also somewhat more than the cursor approach.

See this code.

-- A better SET-based approach
SELECT      DATEADD(DAY, d.Number, s.dt) AS [Date],
            AVG(s.Rate) AS NormalMovingAverage,
            SUM(d.Coefficient * s.Rate) AS WeightedMovingAverage
FROM        @Sample AS s
CROSS JOIN  (
                VALUES  (0, 0.7),
                        (1, 0.2),
                        (2, 0.1)
            ) AS d(Number, Coefficient)
GROUP BY    DATEADD(DAY, d.Number, s.dt)
HAVING      COUNT(*) = 3
ORDER BY    DATEADD(DAY, d.Number, s.dt)


Changing this piece of code to a 5 day moving average only takes editing in 2 places; the derived table and the having clause. If you have the coefficients in an auxiliary table, there will be only one edit to be made! In the auxiliary table, because the having part can equal a subquery calculating the number of records in the auxiliary table.

For comparison, I tested all three methods on 5,000 sample rows and got this result

CURSOR             6,813 ms
Common set-based  20,577 ms
Better set-based     127 ms


//Peso

Published Friday, September 18, 2009 11:03 AM by Peso
Filed under: , ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Ralph Wilson said:

I tried setting up your @Sample table in a SQL Server 2005 Sandbox database that I use for testing techniques and then running your fast set-bsaed process.  Apparently, your code is notT-SQL because it threw an error.

Do you have a T-SQL version of this approach?

January 7, 2010 11:27 AM
 

Peso said:

The VALUES approach in this example is a SQL Server 2008 method.

Change that part and insert all valuees with either UNION ALL or multiple insert statements.

January 7, 2010 1:02 PM
 

Aylin said:

concuerdo con el amigpo que osso  todo mundoa al leeerlo se da cuenta

pero bueno espero y lo hagan pronto.................

May 17, 2010 4:33 PM

Leave a Comment

(required) 
(required) 
Submit
Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement