Hybrid storage systems that do not offer some variation of auto tiering or caching are as rare as a Bigfoot sighting. They might still exist, but not for much longer. The reason for that is the vast difference in hard disk drive (HDD) and solid-state drive (SSD) performance. SSDs provide up to 1,000 times more IOPS than HDDs. Systems require auto tiering or caching to make the most of this increased performance. These conceptually provide...
similar benefits, while in fact, they are radically different technologies with different capabilities and use cases.
Auto tiering requires a little historical perspective for an accurate understanding. It was originally developed to take advantage of the performance and cost differences between 15,000-, 10,000-, 7,200-, and 5,400-rpm HDDs.
The underlying principle of auto tiering is that as data ages, its value declines. Data is primarily accessed in the first 72 hours after it's created. That access falls precipitously from that point forward, becoming less and less frequent until after 30 days, when data is accessed occasionally at most. At that point, the data has become "passive" or "cold."
As data's value declines over time, it makes fiscal sense to move it to a lower-performing and lower-cost storage tier. Doing so manually is tedious, repetitive, difficult and labor-intensive. In other words, it doesn't get done. Automated tiering moves data based on such policies as data age, frequency of access, last time accessed and even response time over time.
Early auto tiering data movement used to be based on whole LUNs (volumes) of data or file shares. That's a lot of data movement, with wildly disparate results. A single file access could affect a lot of data. Data movement today is based on much smaller atomic units, such as sub-LUNs; extents (series of grouped blocks also known as chunks, chunklets or slices); files; objects; partial files; or partial objects.
When flash SSDs became practical, cost-effective and pervasive in storage systems, many thought auto tiering would be an ideal fit because of the high cost of flash SSDs. It hasn't quite worked out as well as was expected. Flash SSDs are a vastly different technology from HDDs. They have different issues such as write/erase cycles, electron leakage, wear leveling, error detection and correction coding, and more.
Performance and cost differences between SSDs and HDDs also tend to limit the number of tiers in a system to two. This causes more frequent data movement both downward and upward to best take advantage of those relatively expensive SSDs. However, this constant data movement has two downsides that are important to consider.
First, auto tiering is a reactive technology, meaning that it moves data based on historical trends, not real-time status. (There are exceptions, such as XtremIO's auto tiering, which moves data in real time based on previously recorded application patterns.) Mission-critical or high-performance data utilizes SSDs as the target storage. As the data ages out, it is moved to HDDs. When recalled, it is moved back. This increases the wear and tear on SSDs and decreases their lifecycle.
Second, moving data between tiers is CPU-intensive. This has little impact on system performance when the gating performance bottleneck is the HDD, but it is a very different story with SSDs: There the bottleneck shifts to the storage processor.
There are basically two types of caching: write back and write through. Write-back caching acknowledges writes as they occur. In other words, it takes ownership of the writes. The data stays in the cache or is moved (often referred to as "draining") to lower-performing, lower-cost media based on such policies as frequency of access, last access, age of the data and so forth. Data can also be moved back into the cache based on similar policies. Some data can also be pinned in the cache regardless of the policy. Pinned data is often database hot files, indexes, even metadata.
Write-through cache writes the data to HDDs first. Data is then brought into the cache via a smart algorithm tracking the reads and placing hot data as defined by policy into the SSD cache. Write-through cache frequently takes advantage of read-ahead performance gains when workloads are sequential. However, read-ahead caching actually degrades performance when workloads are highly randomized, as with virtual servers and desktops.
Write-through policies are similar to write-back cache policies, but whereas write-back decides what data to purge from the cache, write-through decides what data to put into the cache. The key difference is that write-through cache neither accepts the original write of the data nor acknowledges it. And just as with write-back caching, write-through caching can pin hot data in the cache.
Write-back cache is much less common than write-through cache because more writes land on the SSDs, increasing the write-erase cycles and shortening the SSDs' viable wear life. It also tends to require a bit more SSD capacity.
Both types of caching typically have smaller capacities than auto tiering does. This leads to more cache misses, because read requests must go back to the HDDs for their data.
Whether auto tiering is better than caching -- or vice versa -- is in the eye of the beholder. Auto tiering performance is best when data movement is highly predictable. There is not a lot of back-and-forth thrashing. Some examples include moving older snapshots or transactional data from SSDs to HDDs after the close of a quarter. Additional SSD capacity may be required to minimize the need to move data to a lower-performing tier as frequently. This is an upfront expense, but can increase the SSD lifespan over time.
Caching in general should provide better performance for real-time workload adjustments, random I/O, transactional applications, and server and desktop virtualization. Caching should also create less wear and tear on the SSDs, based on much-reduced writes, thus increasing SSD life.
Some hybrid storage vendors offer auto tiering, whereas others offer caching. A few offer both.
One very important note about flash-SSD lifespan issues: It has only been recently that these issues have begun to crop up. There have been few market-reported SLC and eMLC wear-life issues. However, with deployments of MLC, they have become much more prominent.
Before deciding which technology works best for you, examine your workloads, then test both technologies under realistic conditions. Select the one that better meets your requirements and perceived requirements while meeting your budget.
About the author:
Marc Staimer is founder, senior analyst and CDS of Dragon Slayer Consulting in Beaverton, Ore. He can be reached at firstname.lastname@example.org.