top of page
  • Writer's picturevP

vSAN - Deduplication and Compression

In this vSAN series, we learned What is vSAN and how we can a Configure vSAN Cluster. In this post, we will discuss about two space efficient techniques introduced in the vSAN i.e Deduplication and Compression.


I'm sure most of the readers are already familiar with the concept of deduplication as it has been widely used in the storage industry for some time now. In a simple words, deduplication checks to see if a block of data is already present on storage. If it is, then instead of storing the same block twice, a small reference is created to the already existing block. If the same block of data occurs many times, significant space can be saved.


The important thing to note is that deduplication and compression is only available on all-flash VSAN configurations. In all-flash VSAN, which is where deduplication and compression are supported, data blocks are kept in the cache tier while it is active/hot for optimal performance. As soon as the data is no longer active (cold), it is de-staged to the capacity tier.


vSAN uses the SHA-1 hashing algorithm. It creates a “fingerprint” for every data block. This hashing algorithm ensures that no two blocks of data result in the same hash, so that all blocks of data are uniquely hashed.


When a new data is written on the disk, it is hashed and then compared to the existing table of hashes. If it already exists, then there is no need to store this new block. VSAN simply adds a new reference to it. If it does not already exist, a new hash entry is created and the block is persisted. It is important to note that only the redundant copies of the block which are in the same disk group are reduced to one. Redundant blocks across multiple disk groups are not duplicated.


The process of deduplication and compression occurs as the data is de-staged to the capacity tier – well after the write acknowledgments have been sent back to the VM. Since the block is not deduplicated or compressed in cache, but it is when the block is moved to the capacity tier, this approach is also called as “near-line”. The advantage with this approach is, when an application is writing data, the same block may be over written multiple times in the cache tier. Once the block is cold (no longer used), it is moved to the capacity tier. It is only at this point does it go through the deduplication and compression processing. This is a significant saving on overhead, as cycles are not wasted deduplicating and compressing a block that is overwritten immediately afterwards, or multiple times afterwards.


Another space saving technique is Compression. vSAN uses the LZ4 compression mechanism and it worked on 4KB blocks. If a new block is found to be unique, it also goes through compression. If the LZ4 compression manages to reduce the size of the block to less than or equal to 2KB, then the compressed version of the block is persisted to the capacity tier. If compression cannot reduce the size to less than 2KB, then the full-sized block is persisted. The compression is applied after deduplication has occurred before de-staging the data to capacity tier.


When you enable deduplication and compression it consumes a small amount of capacity for metadata, such as hash, translation, and allocation maps. The space consumed by this metadata is relative to the size of the vSAN datastore and is typically around 5% of the total capacity.


Few things to consider before configuring deduplication and compression in a vSAN cluster.

  • Deduplication and compression are available only on all-flash disk groups.

  • On-disk format version 3.0 or later is required to support deduplication and compression.

  • You must have a valid license to enable deduplication and compression on a cluster.

  • All disk groups participate in data reduction through deduplication and compression.

  • As mentioned above, vSAN can eliminate duplicate data blocks within each disk group, but not across disk groups.


Enabling deduplication and compression

Enabling deduplication and compression on a vSAN cluster is pretty simple.


That’s it, you have successfully enabled deduplication and compression.


I hope you like reading this post.


Thank you for reading!


*** Explore | Share | Grow ***

46 views0 comments

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page