Last week, I wrote about logging for index rebuild operations. I wanted to publish the result of that testing as soon as I could, because that dealt with a specific question I was trying to answer. However, I actually started out my testing by looking at the logging that was done for a different operation, and ended up generating some new questions for myself.
Before I started testing the index rebuilds, I thought I would just get warmed up by observing the logging for SELECT INTO. I thought I knew what got logged, but I was wrong. My understanding was that as a minimally logged operation, in FULL recovery, SELECT INTO would log each 8K page as it was written, not each row like a normal INSERT would. In BULK_LOGGED, SELECT INTO would create a log record for each page, but it would not log the entire 8K page. And of course, it would also not log every row. However, my first test turned up some surprising results.
Here's what I did: First, I made sure the AdventureWorks database was in FULL recovery, and dropped the Details table if it existed. Then I created a table called SELECTINTOLoggingData to hold the results of calling fn_dblog. As in last week's script, the following information was captured:
- Max log record size
- Sum of log record sizes
- Number of log records
- Number of log records larger than 8K
Finally, I backed up the AdventureWorks database to give me a foundation for log backups. So here's the first part of the script:
ALTER DATABASE AdventureWorks SET RECOVERY FULL
GO
USE AdventureWorks;
GO
IF EXISTS (SELECT 1 FROM sys.tables
WHERE name = 'Details')
DROP TABLE Details;
IF EXISTS (SELECT 1 FROM sys.tables WHERE name = 'SELECTINTOLoggingData')
DROP TABLE SELECTINTOLoggingData;
GO
CREATE TABLE SELECTINTOLoggingData
(DBRecModel CHAR(12), MaxLogRowSize INT, NumLogRows INT, NumLargeRows INT);
GO
BACKUP DATABASE AdventureWorks TO DISK = 'c:\backups\AW.bak' WITH INIT;
GO
The actual testing was be done first in FULL recovery, then in BULK_LOGGED. I backed up up the transaction log, then created the Details table by executing a SELECT INTO from the Sales.SalesOrderDetail table. The Details table has 1495 pages and 121,317 rows. Finally, I inserted the relevant values from fn_dblog into the SELECTINTOLoggingData table.
---- FULL Recovery
BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
SELECT COUNT(*) FROM fn_dblog(null, null);
GO
SELECT * INTO Details
FROM Sales.SalesOrderDetail;
GO
INSERT INTO SELECTINTOLoggingData
SELECT 'FULL',
MAX([Log Record Length]),
COUNT(*),
(SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length] >8000)
FROM fn_dblog(null, null);
GO
DROP TABLE Details;
GO
-- BULK_LOGGED Recovery
ALTER DATABASE AdventureWorks SET RECOVERY BULK_LOGGED;
BACKUP LOG AdventureWorks TO DISK = 'c:\backups\AW_log.bak' WITH INIT;
SELECT COUNT(*) FROM fn_dblog(null, null);
GO
SELECT * INTO Details
FROM Sales.SalesOrderDetail;
GO
INSERT INTO SELECTINTOLoggingData
SELECT 'BULK_LOGGED',
MAX([Log Record Length]),
COUNT(*),
(SELECT COUNT(*) FROM fn_dblog(null, null) WHERE [Log Record Length] >8000)
FROM fn_dblog(null, null);
GO
Finally, I looked at the data I collected:
SELECT * FROM SELECTINTOLoggingData;
GO
Here are my results:

Although the number of large (8K) log rows was as expected, I was quite surprised to see the total number of log records! (And yes, I know the column names use the term "Log Rows", but I'm calling them "Log Records". I guess I just wanted a shorter column name.)
I then ran the same test on a different version of the Sales.SalesOrderDetail table that I use in some of my class examples. Here is the script to create the Sales.SalesOrderDetail2 table:
CREATE TABLE Sales.SalesOrderDetail2 (
[SalesOrderID] [int] NOT NULL,
[SalesOrderDetailID] [int] NOT NULL,
[CarrierTrackingNumber] [nvarchar](25) NULL,
[OrderQty] [smallint] NOT NULL,
[ProductID] [int] NOT NULL,
[SpecialOfferID] [int] NOT NULL,
[UnitPrice] [money] NOT NULL,
[UnitPriceDiscount] [money] NOT NULL ,
CONSTRAINT [PK_SalesOrderDetail2_SalesOrderID_SalesOrderDetailID] PRIMARY KEY CLUSTERED
(
[SalesOrderID] ASC,
[SalesOrderDetailID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY];
GO
INSERT INTO Sales.SalesOrderDetail2 SELECT [SalesOrderID]
,[SalesOrderDetailID]
,[CarrierTrackingNumber]
,[OrderQty]
,[ProductID]
,[SpecialOfferID]
,[UnitPrice]
,[UnitPriceDiscount]
FROM AdventureWorks.Sales.SalesOrderDetail;
GO
After setting the AdventureWorks database back to FULL recovery, deleting the Details table and truncating the SELECTINTOLoggingData table, I run the same test as above (starting at the "Full Recovery" comment, substituting Sales.SalesOrderDetail2 for Sales.SalesOrderDetail.
ALTER DATABASE AdventureWorks SET RECOVERY FULL;
GO
DROP TABLE Details;
GO
TRUNCATE TABLE SELECTINTOLoggingData;
GO
Here are my new results:

So these are much more like the results I expected from a SELECT INTO operation!
Why did the first example, selecting from Sales.SalesOrderDetail, do so much more logging?
It turns out that if you have a IDENTITY column in the source table, SELECT INTO must then log every row as the IDENTITY value is generated.
Does this mean you should rethink the beneift of using IDENTITY columns? Only if you're doing a lot of SELECT INTO operations to copy a table with IDENTITY columns. Normal INSERT statements, which is how IDENTITY values are usually generated, are going to have the same amount of logging whether or not the table has an IDENTITY column. (Let me know if you think otherwise.)
So now you know.
~Kalen
p.s. I would like to thank Sunil Agarwal for his brilliant insights and encouragement. Thanks for everything, Sunil!