SQL Server 2014 introduces updateable nonclustered and clustered columnstore indexes. When new data is inserted, it goes first to deltastore where background process divides it into row groups, indexes and compresses. Indexing progress can be observed via sys.column_store_row_groups DMV. According to BOL, row groups in deltastore can be in one of 3 possible statuses: OPEN (a read/write row group that is accepting new records. An open row group is still in rowstore format and has not been compressed to columnstore format), CLOSED (a row group that has been filled, but not yet compressed by the tuple mover process) and COMPRESSED (a row group that has filled and compressed). COMPRESSED actually means that data is already part of the index. Row groups that are still in OPEN or CLOSED state are yet to be added to the index and currently reside somewhere in deltastore (internal object inside the database). You won’t find it in sys.objects or sys.partitions. But it is there – you can observe it using “half-documented” sys.system_internals_partitions view.
Anyway, performance difference between fully updated columnstore index and situation when part of your data that is still in OPEN or CLOSED state in deltastore is huge. So we would like the background process that indexes and compresses data to be as fast as possible. Unfortunately it is painfully slow. I’ve loaded 45 billion rows into clustered columnstore index. When loading had finished, I had only 13 billion rows in the index – less than 30%. 12 hours passed since then – I continue all the time to observe indexing progress using following query:
3: SUM(total_rows) as TotalRows,
4: SUM(CONVERT(BIGINT, size_in_bytes))/(1024*1024) as SizeMB
5: FROM sys.column_store_row_groups
6: GROUP BY state_description;
Results are very disappointing – less than 1 billion rows in 12 hours. Meaning indexing takes much longer than loading itself. Actually, more than 10 times longer. And querying data which is still in the deltastore is painful even more. Here is estimated plan – couldn’t wait for an actual, stopped the query after 15 minutes (when indexed with fully updated columnstore index, runs under 2 seconds):
Look at the estimated IO and CPU costs – they’re huge. I guess, it means scanning entire deltastore. Which is reasonable – there is no traditional b-tree index on this data (for sure not while in deltastore), columnstore index isn’t ready yet, so scan is inevitable. And even more – I couldn’t find any query that would run fast. That makes sense too – even if required data resides fully inside columnstore index, we can’t be sure about that. What if there is another part that is still in deltastore? It is unsorted and un-indexed, remember? So for any query SQL Server should check deltastore – in case something there is not in “COMPRESSED” status. In my case it means scanning 45B – 14B = 31B rows.
What does it mean? It means that updateable clustered/nonclustered columnstore indexes should be used with extreme caution. If your data is constantly updated (loaded) or if you have huge single ETL but not enough time for a background process to calculate entire index before users start to query it – result can be extremely painful. Don’t test your system only when data is ready and queries run blazingly fast. Test it when deltastore contains some data. I hope, Microsoft would improve performance of the process that indexes and compresses data – in my environment it seems extremely lazy: nothing else runs, no user sessions at all and still CPU is on 1-2% and indexing progress according to sys.column_store_row_groups is very very slow.
P.S. As I’ve blogged earlier, sys.partitions sees only rows which are in “COMPRESSED” status. So sp_spaceused, for example, currently indicates that my table contains only 14B rows.
Update: by request from Kevin Boles in comments below, I’m providing environment metrics.
It wasn’t intended to be a stress test, just turned out that way. The original intention was to compare my client’s application performance in a current version where we store some data in long compressed (proprietary + zip) blobs vs same data in clustered columnstore index. Updateability is a must, so columnstore index in SQL 2012 wasn’t an option. Data loading had been executed by application that was parsing the blob and calling stored procedure that receives TVP as an input parameter. Every call contained ~5K rows. 3 instances of this “migration tool” were running concurrently. The important part here is that it wasn’t bulk load. Data loading took ~3 days, mainly because of CPU bottleneck in migration application – database could do faster. As I’ve mentioned above, by the end of the loading, when ~45B rows had been inserted into the table, SQL Server had indexes and compressed 13B rows.
Environment is Virtual Machine running Windows 2012 Server, Intel Xeon CPU E5-2670 (2.6GHz) with 4 virtual cores; 6GB of RAM; IO – don’t know what is below the disk that I see but anyway, there is no IO bottleneck.
SQL Server used ~5GB of memory. MAXDOP setting remained default (0). Delayed Transaction Durability was set to “forced”.
I took IO and wait types snapshots several times during the day. IO latencies were ~12ms/read and ~0.5 ms/write. CPU was between 1% and 5%. During 5 minutes between snapshots SQL Server wrote ~144MB and read ~240 MB. Not much. Looks like server is almost idle. Tempdb was idle either - ~1% of total IO activity.
There were only 2 noticeable wait types – between subsequent 5 minutes snapshots they both had 299.98 second wait time (i.e. entire interval): DIRTY_PAGE_POLL and HADR_FILESTREAM_IOMGR_IOCOMPLETION.
Average number of rows per rowstore is ~1.05M. Average row group size is ~9.3MB. All the data resides in a single partition – as it is single partition in the current system (although with long blobs it is only 10M rows instead of 45B).
Regarding dictionary usage, sys.column_store_dictionaries contains ~141K rows with entry_count in a range from 1 to 27.5K.
Regarding query: here it is (without table and column names):
1: SELECT AVG(Col5)
2: FROM dbo.Tbl
4: Col1 = @Int1 AND
5: Col2 = @Int2;
I’ve written aside some statistical data during data migration, so I know for sure that this WHERE clause filters 27.5M rows. Same query on a fully updated columnstore index (same number of rows to query but index itself much smaller) on the same machine took under 2 seconds with cold cache.