Introduction to Data Deduplication in Windows Server 2012

Introduction to Data Deduplication in Windows Server 2012
First released on TECHNET on May 20, 2012 Hi, this is Scott Johnson and I’m a Program Manager on the Windows File Server group. I’ve been at Microsoft for 17 years and I’ve seen a great deal of cool innovation because time. Inside Windows Server 2012 we have actually consisted of a quite cool brand-new function called Data Deduplication that allows you to effectively save, move and backup less information.

This is the outcome of an substantial cooperation with Microsoft Research and after 2 years of advancement and screening we now have modern deduplication that utilizes variable-chunking and compression and it can be used to your main information. The function is created for market basic hardware and can operate on an extremely little server with just a single CPU, one SATA drive and 4GB of memory. Information Deduplication will scale perfectly as you include several cores and extra memory. This group has a few of the most intelligent individuals I have actually dealt with at Microsoft and we are all extremely fired up about this release.

Does Deduplication Matter?

Hard disk drives are growing and more affordable every year, why would I require deduplication? Well, the issue is development. Development in information is taking off a lot that IT departments all over will have some severe obstacles satisfying the need. Take a look at the chart listed below where IDC has actually anticipated that we are starting to experience huge storage development. Can you envision a world that takes in 90 million terabytes in one year? We have to do with 18 months away!


IDC Worldwide File-Based Storage 2011-2015 Forecast:
Foundation Solutions for Content Delivery, Archiving and Big Data, doc #231910, December 2011

Welcome to Windows Server 2012!

This brand-new Data Deduplication function is a fresh technique. We simply sent a Large Scale Study and System Design paper on Primary Data Deduplication to USENIX to be gone over at the upcoming Annual Technical Conference in June.

Typical Savings:

We evaluated numerous terabytes of genuine information inside Microsoft to get price quotes of the cost savings you must anticipate if you switched on deduplication for various kinds of information. We concentrated on the core release situations that we support, consisting of libraries, implementation shares, file shares and user/group shares. The Data Analysis table listed below programs the common cost savings we had the ability to receive from each type:


Microsoft IT has actually been releasing Windows Server with deduplication for the in 2015 and they reported some real cost savings numbers. These numbers confirm that our analysis of common information is quite precise. In the Live Deployments table listed below we have 3 preferred server work at Microsoft consisting of:

  • A develop laboratory server: These are servers that develop a brand-new variation of Windows every day so that we can check it The debug signs it gathers enables designers to examine the specific line of code that represents the maker code that a system is running. There are a great deal of duplicates produced given that we just alter a percentage of code on an offered day. When groups launch the very same group of files under a brand-new folder every day, there are a great deal of resemblances every day.
  • Product release shares: There are internal servers at Microsoft that hold every item we’ve ever delivered, in every language. As you may anticipate, when you slice it up, 70% of the information is redundant and can be distilled down well.
  • Group Shares: Group shares consist of routine file shares that a group may utilize for keeping information and consists of environments that utilize Folder Redirection to flawlessly reroute the course of a folder (like a Documents folder) to a main area.


Below is a screenshot from the brand-new Server Manager ‘Volumes’ user interface on of among the construct laboratory servers, discover just how much information that we are minimizing these 2TB volumes. The laboratory is conserving over 6TB on each of these 2TB volumes and they’ve still got about 400GB totally free on each drive. These are some quite enjoyable numbers.


There is a clear roi that can be determined in dollars when utilizing deduplication. The area cost savings are significant and the dollars-saved can be determined quite quickly when you pay by the gigabyte. I’ve had lots of people state that they desire Windows Server 2012 simply for this function. That it might allow them to postpone purchases of brand-new storage ranges.

Data Deduplication Characteristics:

1) Transparent and simple to utilize: Deduplication can be quickly set up and allowed on chosen information volumes in a couple of seconds. Applications and end users will not understand that the information has actually been changed on the disk and when a user demands a file, it will be transparently provided right now. The file system as an entire supports all of the NTFS semantics that you would anticipate. Some files are not processed by deduplication, such as files secured utilizing the Encrypted File System (EFS), submits that are smaller sized than 32KB or those that have actually Extended Attributes (EAs). In these cases, the interaction with the files is totally through NTFS and the deduplication filter motorist does not get included. If a file has an alternate information stream, just the main information stream will be deduplicated and the alternate stream will be left on the disk.

2) Designed for Primary Data: The function can be set up on your main information volumes without disrupting the server’s main goal. Hot information (files that are being composed to) will be passed over by deduplication up until the file reaches a specific age. By doing this you can get optimum efficiency for active files and terrific cost savings on the remainder of the files. Files that fulfill the deduplication requirements are described as “in-policy” files.

a. Post Processing : Deduplication is not in the write-path when brand-new files occur. New submits compose straight to the NTFS volume and the files are assessed by a file groveler on a routine schedule. The background processing mode look for files that are qualified for deduplication every hour and you can include extra schedules if you require them.

b. File Age: Deduplication has actually a setting called MinimumFileAgeDays that manages how old a file ought to be prior to processing the file. The default setting is 5 days. This setting is configurable by the user and can be set to “0” to process files no matter how old they are.

c. File Type and File Location Exclusions: You can inform the system not to procedure files of a particular type, like PNG submits that currently have fantastic compression or compressed CAB files that might not take advantage of deduplication. You can likewise inform the system not to process a particular folder.

Like this post? Please share to your friends:
Leave a Reply

;-) :| :x :twisted: :smile: :shock: :sad: :roll: :razz: :oops: :o :mrgreen: :lol: :idea: :grin: :evil: :cry: :cool: :arrow: :???: :?: :!: