THE SQL Server Blog Spot on the Web

Welcome to SQLblog.com - The SQL Server blog spot on the web Sign in | |
in Search

Kalen Delaney

Geek City: What gets logged for SELECT INTO operations?

Last week, I wrote about logging for index rebuild operations.  I wanted to publish the result of that testing as soon as I could, because that dealt with a specific question I was trying to answer. However, I actually started out my testing by looking at the logging that was done for a different operation, and ended up generating some new questions for myself.

Before I started testing the index rebuilds, I thought I would just get warmed up by observing the logging for SELECT INTO. I thought I knew what got logged, but I was wrong. My understanding was that as a minimally logged operation, in FULL recovery, SELECT INTO would log each 8K page as it was written, not each row like a normal INSERT would. In BULK_LOGGED, SELECT INTO would create a log record for each page, but it would not log the entire 8K page. And of course, it would also not log every row.  However, my first test turned up some surprising results.

Here's what I did: First, I made sure the AdventureWorks database was in FULL recovery, and dropped the Details table if it existed. Then I created  a table called SELECTINTOLoggingData to hold the results of calling fn_dblog.  As in last week's script, the following information was captured:

  • Max log record size
  • Sum of log record sizes
  • Number of log records
  • Number of log records larger than 8K

Finally, I backed up the AdventureWorks database to give me a foundation for log backups. So here's the first part of the script:

ALTER DATABASE AdventureWorks SET RECOVERY FULL
GO
USE AdventureWorks;
GO
IF EXISTS (SELECT 1 FROM sys.tables    
            WHERE name = 'Details')
       DROP TABLE Details;
      
IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'SELECTINTOLoggingData')
    DROP TABLE SELECTINTOLoggingData;
GO

CREATE TABLE SELECTINTOLoggingData
(DBRecModel CHAR(12), MaxLogRowSize INT, NumLogRows INT, NumLargeRows INT);
GO

BACKUP DATABASE AdventureWorks TO DISK = 'c:\backups\AW.bak' WITH INIT;
GO

The actual testing was be done first in FULL recovery, then in BULK_LOGGED. I backed up up the transaction log, then created the Details table by executing a SELECT INTO from the Sales.SalesOrderDetail table.  The Details table has  1495 pages and 121,317 rows. Finally, I inserted the relevant values from fn_dblog  into the SELECTINTOLoggingData table.

----  FULL Recovery


BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
SELECT COUNT(*) FROM fn_dblog(null, null);
GO
SELECT * INTO Details
FROM Sales.SalesOrderDetail;
GO


INSERT INTO SELECTINTOLoggingData
SELECT 'FULL',
        MAX([Log Record Length]),
        COUNT(*),
        (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  >8000)
        FROM fn_dblog(null, null);
GO

DROP TABLE Details;
GO


-- BULK_LOGGED Recovery


ALTER DATABASE AdventureWorks SET RECOVERY BULK_LOGGED;
BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
SELECT COUNT(*)  FROM fn_dblog(null, null);
GO

SELECT * INTO Details
FROM Sales.SalesOrderDetail;
GO


INSERT INTO SELECTINTOLoggingData
SELECT 'BULK_LOGGED',
        MAX([Log Record Length]),
        COUNT(*),
        (SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length]  >8000)
        FROM fn_dblog(null, null);
GO

Finally, I looked at the data I collected:

SELECT * FROM SELECTINTOLoggingData;
GO

Here are my results:

image

Although the number of large (8K) log rows was as expected, I was quite surprised to see the total number of log records! (And yes, I know the column names use the term "Log Rows", but I'm calling them "Log Records".  I guess I just wanted a shorter column name.)

I then ran the same test on a different version of the Sales.SalesOrderDetail table that I use in some of my class examples. Here is the script to create the Sales.SalesOrderDetail2 table:

CREATE TABLE Sales.SalesOrderDetail2 (
    [SalesOrderID] [int] NOT NULL,
    [SalesOrderDetailID] [int] NOT NULL,
    [CarrierTrackingNumber] [nvarchar](25) NULL,
    [OrderQty] [smallint] NOT NULL,
    [ProductID] [int] NOT NULL,
    [SpecialOfferID] [int] NOT NULL,
    [UnitPrice] [money] NOT NULL,
    [UnitPriceDiscount] [money] NOT NULL ,
CONSTRAINT [PK_SalesOrderDetail2_SalesOrderID_SalesOrderDetailID] PRIMARY KEY CLUSTERED
(
    [SalesOrderID] ASC,
    [SalesOrderDetailID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY];
GO

INSERT INTO Sales.SalesOrderDetail2 SELECT [SalesOrderID]
      ,[SalesOrderDetailID]
      ,[CarrierTrackingNumber]
      ,[OrderQty]
      ,[ProductID]
      ,[SpecialOfferID]
      ,[UnitPrice]
      ,[UnitPriceDiscount]
  FROM AdventureWorks.Sales.SalesOrderDetail;
GO

After setting the AdventureWorks database back to FULL recovery, deleting the Details table and truncating the SELECTINTOLoggingData table,  I run the same test as above (starting at the "Full Recovery" comment, substituting Sales.SalesOrderDetail2 for Sales.SalesOrderDetail.

ALTER DATABASE AdventureWorks SET RECOVERY FULL;
GO
DROP TABLE Details;
GO
TRUNCATE TABLE SELECTINTOLoggingData
;
GO

Here are my new results:

image

So these are much more like the results I expected from a SELECT INTO operation!

Why did the first example, selecting from Sales.SalesOrderDetail, do so much more logging?

It turns out that if you have a IDENTITY column in the source table, SELECT INTO must then log every row as the IDENTITY value is generated.

Does this mean you should rethink the beneift of using IDENTITY columns? Only if you're doing a lot of SELECT INTO operations to copy a table with IDENTITY columns. Normal INSERT statements, which is how IDENTITY values are usually generated, are going to have the same amount of logging whether or not the table has an IDENTITY column.  (Let me know if you think otherwise.)

So now you know.

~Kalen

p.s. I would like to thank Sunil Agarwal for his brilliant insights and encouragement. Thanks for everything, Sunil!

Published Tuesday, March 15, 2011 1:41 PM by Kalen Delaney

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

John Paul Cook said:

Thanks for posting that, Kalen!

March 15, 2011 5:46 PM
 

Brad Schulz said:

Interesting... Good to know... Thanks!

I suppose if we had no need for the IDENTITY column in the INTO table (and just wanted it to be a "normal" INT column), we could just do a SELECT ISNULL(SalesOrderDetailID,0) AS SalesOrderDetailID,... INTO Details.  That would both preserve the non-nullability of the column and kill the IDENTITY attributes.

--Brad

March 15, 2011 7:32 PM
 

Louis Davidson said:

Awesome information! Really interesting.

March 16, 2011 4:07 PM
 

GrumpyOldDBA said:

very interesting, thankyou

March 17, 2011 9:58 AM
 

PZ said:

My understanding is that in full recovery mode these operations are fully logged(http://msdn.microsoft.com/en-us/library/ms191244.aspx).

March 17, 2011 2:18 PM
 

Kalen Delaney said:

My script shows what happens in both full and bulk_logged mode. "Fully logged" means different things for different operations. Please re-read the post. For SELECT INTO with an identity fully logged is every single row. For SELECT INTO without an identity, fully logged means logging the full pages as they are filled.

Thanks

Kalen

March 17, 2011 2:29 PM
 

AK said:

Hi Kalen.

fully logged means logging the full pages as they are filled ? How does it work ?

What does it exactly means ?

September 19, 2011 3:16 PM
 

Kalen Delaney said:

Fully logged means SQL Server logs enough information to fully roll forward the operation when needed during restore or recovery. As each page is filled, a log record is generated that contains the contents of the page.

September 19, 2011 10:35 PM

Leave a Comment

(required) 
(required) 
Submit

This Blog

Syndication

Favorite Non-technical Sites or Blogs

Powered by Community Server (Commercial Edition), by Telligent Systems
  Privacy Statement