Comparing Space Usage for Different Compression Types
Sometimes it is
not clear to everyone if and how the different types of indexes influence
compression, as well as the impact on the disk space used for the compressed
and indexed tables. In the following article, I will perform a small
investigation, so that the reader can get an impression on the impact on disk
space before and after compression, considering different index types. This
comparison will not consider the performance implications of the compression
and the corresponding advantages and disadvantages. It is just looking at the
space used. On the internet you can find many good articles about the
performance implications.
For this test I
am using an example Global Master data file with no normalization applied and
non-optimal selection of datatypes. The file contains 44 columns with
different data types and 279,690 rows. The smallest data type is a varchar(1)
and the biggest is a varchar(255). The table has just 1 partition.
This table has a
lot more potential for reducing the table size, but in this test, I will just
focus on a comparison on Heap tables, Clustered index tables and Clustered
Columnstore index tables with compression settings of:
·
Page level
·
Columnstore
·
Archive
·
No compression
It is not
possible to create a table with no compression and a Columnstore Index, because
Columnstore tables and indexes are always stored with columnstore
compression. This will give 6 different scenarios:
|
Type of Compression
|
Type of Index
|
|
No Compression
|
Heap
|
|
No Compression
|
Clustered
|
|
Columnstore
|
Clustered Columnstore
|
|
Page Level Compression
|
Heap
|
|
Archive
|
Clustered Columnstore
|
|
Page Level Compression
|
Columnstore
|
The Indexed key
column is the Material number. It is null but doesn’t have any NULLs included.
For the Clustered Index I use a FILLFACTOR of 100. To get the used space
of the table the stored procedure sp_spaceused will be used. The used output of
this SP contains:
|
Column name
|
Description
|
|
rows
|
Number of rows
existing in the table.
|
|
data
|
Total amount
of space used by data.
|
|
index_size
|
Total amount
of space used by indexes.
|
This is the
script to get the output:
--Script to create the table,
which give information of the used space the tables.
DROP TABLE #spaceused;
GO
CREATE TABLE #spaceused
(name NVARCHAR(128),
rows
CHAR(20),
reserved
VARCHAR(18),
data
VARCHAR(18),
index_size VARCHAR(18),
unused
VARCHAR(18)
);
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[NoComp_ClustInd]';
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[ClustColumnStInd]';
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[NoComp_Heap]';
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[PageLevel_ClustInd]';
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[Archive_ClustColumnStInd]';
INSERT INTO #spaceused
EXEC sp_spaceused
N'[core].[PageLevel_Heap]';
SELECT name,
rows,
data,
index_size,
FROM #spaceused;
This is the
table overview before the indexes are created and the compression is used. The
column Total Size is the sum of the columns data and Index_size. The columns
data, index_size and total Size are in using the unit of measure KB.
|
name
|
rows
|
data
|
index_size
|
Total Size
|
|
NoComp_ClustInd
|
279690
|
122024
|
16
|
122.040
|
|
ClustColumnStInd
|
279690
|
122024
|
16
|
122.040
|
|
NoComp_Heap
|
279690
|
122024
|
16
|
122.040
|
|
PageLevel_ClustInd
|
279690
|
122024
|
16
|
122.040
|
|
Archive_ClustColumnStInd
|
279690
|
122024
|
16
|
122.040
|
|
PageLevel_Heap
|
279690
|
122024
|
16
|
122.040
|
This is the
table after the indexes are created on the table. The Heap table doesn’t get a
new index for sure:
|
name
|
rows
|
data
|
index_size
|
Total Size
|
|
NoComp_ClustInd
|
279690
|
123168
|
480
|
123.648
|
|
ClustColumnStInd
|
279690
|
34016
|
0
|
34.016
|
|
NoComp_Heap
|
279690
|
122024
|
16
|
122.040
|
|
PageLevel_ClustInd
|
279690
|
123160
|
480
|
123.640
|
|
Archive_ClustColumnStInd
|
279690
|
34016
|
0
|
34.016
|
|
PageLevel_Heap
|
279690
|
122024
|
16
|
122.040
|
Now we create
Indexes on the analyzed tables.
CREATE CLUSTERED INDEX
IX_NoComp_ClustInd ON core.NoComp_ClustInd
(Material) WITH( FILLFACTOR = 100) ON
[PRIMARY];
CREATE CLUSTERED COLUMNSTORE
INDEX IX_ClustColumnStInd ON core.ClustColumnStInd
ON [PRIMARY];
CREATE CLUSTERED INDEX
IX_PageLevel_ClustInd ON core.PageLevel_ClustInd
(Material) WITH( FILLFACTOR = 100) ON
[PRIMARY];
CREATE CLUSTERED COLUMNSTORE
INDEX IX_Archive_ClustColumnStInd ON core.Archive_ClustColumnStInd
ON [PRIMARY];
Now let's use
Compression on the analysed tables.
ALTER TABLE
[core].PageLevel_ClustInd REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = PAGE);
ALTER TABLE
[core].Archive_ClustColumnStInd REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE)
ALTER TABLE
[core].PageLevel_Heap REBUILD PARTITION = ALL
WITH (DATA_COMPRESSION = PAGE);
This is how the
used space look after using the compression on the table:
|
name
|
rows
|
data
|
index_size
|
Total Size
|
|
NoComp_ClustInd
|
279690
|
123168
|
480
|
123.648
|
|
ClustColumnStInd
|
279690
|
34016
|
0
|
34.016
|
|
NoComp_Heap
|
279690
|
122024
|
16
|
122.040
|
|
PageLevel_ClustInd
|
279690
|
34184
|
192
|
34.376
|
|
Archive_ClustColumnStInd
|
279690
|
22792
|
0
|
22.792
|
|
PageLevel_Heap
|
279690
|
36824
|
16
|
36.840
|
As we can see
the overall winner is the Columnstore Index. It uses the least space, even
without archive compression. But if you use archive compression, the used space
is by far the lowest. This makes it the favorite, if your target is use the
lowest amount of used space. The second is the Columnstore Index without
compression. In third place we see the clustered index, which is using the Page
Level compression, but very close to it is the heap table with Page Level
Compression.
Another finding
is, that we see differences in the used space between the clustered index table
and the heap table. Even without considering the used space for the index we
see a difference. Before the compression the heap table is using less space
than the table with the clustered index. But after the compression it changes
and the table with the clustered index is using slightly less space.
From the results
of this example I can see three groups in terms of space usage:
1.
Columnstore
Indexes with Archive Compression (very low space usage)
2.
Page Level
compression with and without an clustered index and Columnstore index without
Archive Compression (low to middle space usage)
3.
Tables without
Compression and Indexes (high space usage)
The main finding
is that the kind of Index will have influence on the used disk space before and
after the compression. The interesting thing is that the table with the column
store index is using always the least disk space.
This is just a
test that is based on one specific table. It could give different results on
other tables. Maybe even an optimization (normalization, correct datatypes,…)
of the table could change the results. This comparison is just to get a
feeling for the handling of different compressions and clustered indexes.