Your applications might be begging for the addition of flash storage technology, but you'll have to decide where to put it, how much you'll need and how it should be used.
Flash storage can seem like an easy answer to application performance problems, but deciding what type of flash product to buy, and where and how to deploy it, is a lot more complicated.
Deciding what kind of solid-state storage to deploy, where it's deployed and the capacity required shouldn't be a process based on theoretical scenarios or data sheet statistics. It involves some analysis of the current environment to help define the problem you hope to remedy with flash storage. Essentially, you need to determine where the bottlenecks are, what the workloads involved look like, and what besides storage performance needs to be fixed or improved upon. You also have to take into account real-world constraints that can limit the possibilities or options one has to consider.
Flash choices aren't always evident
It would be great if there was a matrix that showed use cases on one axis, such as virtual desktop infrastructure (VDI), server virtualization, big data analytics and so on, and the ideal flash options for each on the other axis. But that's almost impossible because there are just too many variables that aren't related to use case, but are themselves interrelated.
For example, the location of the storage bottleneck typically determines where to put flash technology (in the host, for example, or in a storage array), but it has less to do with the use case and more to do with the makeup of the current storage infrastructure. However, the location decision can affect what type of flash is to be used (solid-state drives [SSDs] or flash on circuit cards like PCI Express [PCIe]), and whether it's deployed as a cache or storage tier. The type of flash deployment, in turn, can drive the capacity required (tiering requires more than caching), while cost and budget can limit that potential capacity.
Questions like which flash storage technology to use (single-level cell, multi-level cell and so on) are less of a factor today than they were a few years ago (see the sidebar, "The MLC vs. SLC debate: Does it still matter?"). But other factors such as data risk, the need for high availability or rapid data growth may impact decisions in all use cases. So, rather than looking at use cases and data sheet specs, a better way to approach the flash storage decision is to start with the current environment and the analysis used to identify the problem in the first place.
Start with the storage bottleneck
Solid-state storage is most often used to address an application performance issue by speeding up the data transactions that support the servers running those applications. Essentially, there has to be a bottleneck somewhere in the storage infrastructure and the analysis that identifies that bottleneck is the place to start to define the solution.
For flash to be the right solution, one or more of the following resources typically aren't showing high utilization rates: host CPU, host memory, storage system CPU or network bandwidth. Identifying which resources are constrained requires monitoring the utilization rates of each over time and comparing the results with application performance. If the host CPU is running close to its maximum, then storage isn't the problem and attention should then be focused on the compute resources or application architecture. However, if host CPU utilization is low (say, less than 40%) during the time an application is running slowly, it's a good indicator that there's a bottleneck somewhere in the storage infrastructure.
A storage array controller that's "loafing" during this timeframe can mean the storage system is waiting for disk drives (a latency issue within the storage media), so adding SSDs to the storage array could help. However, if the array wasn't designed specifically to support flash then adding SSDs may not be an effective solution, since a drive tray full of SSDs in this scenario may simply turn the storage controller into the bottleneck.
Similarly, putting SSDs into the storage system won't help if that controller is running close to full utilization. Assuming the network isn't the bottleneck, a better solution would be to invest in another storage system that supports SSDs, such as a hybrid array or all-flash array. However, if the network bandwidth is constrained, or if buying another storage system isn't an option, an alternative to consider is installing flash storage in the host server.
Flash in the host server can be disk drive form-factor SSDs (SATA or SAS), PCIe flash cards or even flash connected through the memory bus via Dual Inline Memory Modules (DIMM). All these methods bring flash performance closer to the application CPU than with a network-attached array, providing the best potential storage performance by reducing storage latency. Historically, SSDs have been the most economical of these three form-factors, and all things being equal, are often the best choice for capacity-centric use cases. PCIe cards offer better performance than SSDs, but typically cost more on a per-gigabyte basis. But now, the new form factor of putting flash into DIMMs gives users another low-latency option that may open up some new use cases.
The first DIMM form-factor flash devices weren't logically connected through the memory bus, but were cabled to an available SATA header on the motherboard. The main selling point for these products was capacity, since many small-footprint blade servers had few if any SATA drive slots available but did have unused memory slots. More recently, companies have come out with flash modules that logically connect through the memory bus, providing even lower latency than is afforded by PCIe-based flash while still taking advantage of available DIMM slots. This "memory channel" technology is just getting started, but along with non-volatile DIMM (NVDIMM), represents another exciting frontier for server-side flash.
Another potential benefit to putting solid-state storage in the server instead of in a
network-attached storage system is a reduction in SAN traffic. If an application can get the data it needs from a flash cache or flash tier in the server, it doesn't have to bring that data across the network. That can result in less work for the shared storage array and more resources freed up to support other servers. The reduction in network traffic can help to make server-side flash a compelling alternative to buying another shared storage system.
To tier or not to tier
Once the placement decision has been made, the type of flash implementation -- how the solid-state storage will actually be used -- should be determined. Aside from all-flash arrays, implementation methodologies focus on getting the most appropriate data into flash before it's needed and then working behind the scenes to keep it that way. Tiering essentially creates a high-speed storage area for the most critical data sets or sub-sets, such as database indexes or change logs, and populates the flash based on predetermined policies. Tiering typically requires more flash capacity than caching does, so it may not be appropriate if budgets or physical space are limited. Caching may be a better choice in these situations, but should be considered in all use cases.
Caching software is often included as a storage system feature and deployed as a way to maximize the flash capacity installed in a traditional storage array. When available, it can be a compelling option, since it's essentially transparent to the user and often requires minimal configuration. Caching can also come with a PCIe flash card to be installed on the host server.
Another option is a caching offering that's deployed as a standalone piece of software and used to accelerate applications on a specific server. These solutions offer the added flexibility of using any manufacturer's flash products and supporting different flash form factors (PCIe, SSD or DIMM). Some even support the concatenation of flash volumes, which enables new SSDs to be added transparently to an existing deployment.
There are some potential downsides, however. Compared with tiering, cache performance can be less predictable and the high "turnover" of data in a cache can impact the lifespan of solid-state storage. Write caching can also present some risks (see the section on "Growth, risk and high availability").
Caching solutions can also be tailored to applications such as server virtualization, VDI or databases, leveraging a knowledgebase of application-specific data types and processes to improve cache performance. But the amount of flash capacity required can be an important decision factor, one that can vary significantly even given similar use cases.
How much flash is enough?
Tiering requires enough flash to hold the entire application or at least the most critical data sets, so the required capacity is relatively simple to determine. But caching capacity is much more difficult to assess. Rules of thumb are nice places to start, but real-world testing is essential to determine when enough flash is available and excess capacity isn't being wasted. One flash vendor that also offers caching software has a large telco customer that provides an interesting example. This company runs several very large data centers supporting multiple VMware clusters and hundreds of virtual servers. Even with its well-defined virtual server environments, it still continually tests its new caching implementations, first with 5% of the primary data set in cache, then 10% and finally 20%. The message is clear: Start with an educated guess as to caching capacity, but be ready to change it based on real-world monitoring.
Growth, risk and high availability
There are other non-performance-related details or constraints that factor into a flash deployment decision. An obvious one is how the existing infrastructure impacts the bottlenecks driving the need for flash. Another is risk. Some write-caching methods can pose a risk to data that's not yet safely on the primary storage area. The options to address these risks, such as "write-around caching" should be understood before a caching solution is chosen.
If high availability is required, it can mean the data on flash must be shared and a SAN-attached array or flash caching appliance should be considered. Also, some server-side flash solutions leverage virtualization software to support failover or to enable the sharing of these local flash resources.
Expected data growth is another constraint that may rule out a server-side solution. In those situations, the system must have adequate capacity and support an expansion process that doesn't impact uptime requirements.
Breaking the bottleneck
The decision to implement flash technology in an IT environment is typically driven by a storage performance bottleneck. Identifying where that bottleneck is can answer the first question: Where should the flash go? When that's been determined, factors such as cost, capacity, risk and whether caching or tiering is the most appropriate can be addressed. However, these factors are often interrelated and may need to be considered together. The question of capacity in caching implementations should always involve real-world testing.
About the author:
Eric Slack is an analyst at Storage Switzerland, an IT analyst firm focused on storage and virtualization.