This article can also be found in the Premium Editorial Download "Storage magazine: Disaster recovery planning: 10 tips to help you avoid costly missteps."
Download it now to read this article plus other related content.
PCIe SSDs offer the best flash performance currently available in your data center computing environment; and now they can be shared among multiple servers as well.
As solid-state storage increasingly becomes a storage system alternative, it's becoming universally recognized as the storage performance option. But its performance and near-zero latency can expose weaknesses in the rest of a storage infrastructure, including its backbone, the storage network. How and where solid-state is implemented is a significantly more important decision than it was with any hard drive-based storage systems. Given the cost and expected performance benefits, avoiding the storage infrastructure altogether by placing solid-state directly in line with the server CPU via the PCI Express (PCIe) bus may be the best option.
This direct connection to the server CPU has led to the rapid adoption of PCIe-based solid-state drives (SSDs). PCIe SSDs benefit from bypassing the storage network altogether, creating almost no latency when retrieving data via a PCIe data path.
Most, but not all, PCIe SSDs also ignore the entire storage protocol stack. This eliminates further latency and allows PCIe SSD vendors to provide custom drivers that are specifically designed for flash-based storage rather than hard disk storage.
All these factors have combined to boost adoption of PCIe-based SSDs and to a substantial expansion in the number of vendors offering PCIe SSD products. In fact, an entire ecosystem of hardware and software has evolved that can sometimes be overwhelming.
PCIe SSD differentiators
Server-side SSD isn't just PCIe
When NAND flash first started gaining popularity, storage protocols and storage interconnect speeds were the performance bottlenecks. This made PCI Express (PCIe) immediately attractive. But in the past year, SATA, SAS, Fibre Channel and Ethernet have all improved performance. While the near-zero latency and direct CPU access of PCIe remains a distinct advantage, the latency gap has definitely narrowed.
If the most extreme level of performance isn't needed, you may benefit just as much from SAS- or SATA-based solid-state drives designed to populate existing hard disk drive bays. In addition to offering excellent performance, many servers allow these drives to be hot swapped for more highly reliable configurations.
Most PCIe offerings have only a few basic components in common. First, they all use NAND flash chips to store data and, second, they put that flash on a PCIe card that's primarily designed to be installed in a server. After that, the features and capabilities of the cards vary greatly. Depending on particular environments, the differences can have a significant impact on performance and the return on a PCIe solid-state storage investment.
During the selection process, it's important to focus on solving the specific performance problem, not on selecting the fastest available card on the market. Paying extra for performance that can't be used is a waste of your IT budget.
Storage protocol compliance
A key differentiator is whether the PCIe SSD card is storage protocol compliant. In other words, when the card is installed in a PCIe slot in the server, will it be recognized as a storage device. A non-compliant card would need an additional driver installed in the operating system (OS) or hypervisor. There are pros and cons to both methods.
If the card isn't storage protocol compliant -- also called "native PCIe SSD" -- it should yield better overall performance because I/O to the card doesn't have to traverse a storage I/O stack designed for hard drive-based systems. It also means a custom driver must be shipped with the PCIe SSD for specific OSes and hypervisors. This driver knows how to directly access the card and should reduce overall protocol latency. How much of a benefit the reduction amounts to is highly dependent on the physical server and the application running on it. Also, most PCIe SSD vendors don't support every OS and hypervisor, so confirming compatibility is critical.
Some PCIe SSD vendors also provide an API set that users can leverage so their applications can have direct access to the PCIe SSD. This allows the application to not only bypass the overhead of the storage protocol stack but also of the OS itself. Of course, this is only valuable if it's possible to access the application source code.
Most native PCIe SSDs can't be booted from directly since the driver needs to load first. This means another boot device must be present in the server. Ironically, many users complement a PCIe SSD with a standard drive form-factor SSD. Having to buy two flash devices to complete the task at hand makes storage protocol-compliant PCIe SSDs that much more attractive.
PCIe SSD cards that are storage compliant look like actual storage devices to the server and OS. Typically, no drivers are needed to access the card and systems can boot from these cards. As a result, these PCIe SSD cards can be used almost universally.
Flash management processing
Flash memory is a unique storage medium that needs special handling. A NAND flash device is made up of multiple cells, each of which can store one or two bits of data. New data is written to cells by first erasing old data from the cell in complete blocks; data isn't overwritten at the byte level as it is with hard disk drives. This erase-then-write process is called the program/erase cycle. NAND flash cells have a limited number of program/erase cycles they can sustain before reliability diminishes. To ensure cells wear out evenly, flash vendors leverage a technique called wear leveling to make sure data is distributed evenly across all cells.
To maintain performance, many flash vendors will do the erase part of the cycle in advance. This technique, called garbage collection, scans for old data on an ongoing basis and "pre-erases" it. This allows for better performance when write traffic is high because the writes don't have to wait for the data to be erased first.
Flash vendors also add their own flash management features. For example, some vendors will add flash intelligence that can reduce flash substrate degradation by "softening writes" when I/O traffic isn't high in an effort to extend flash life. Others have added better data integrity and data protection routines.
All these flash management steps require processing power. Almost every other form of flash storage (all-flash arrays, flash appliances and SSDs) includes this processing power within the storage device. For PCIe SSDs, it's a divided camp: some PCIe vendors use their direct access to the host CPU and offload flash management processing to the host, while others have included onboard processing.
Leveraging server CPU and server memory resources to assist with flash management allows PCIe SSD vendors to shorten their development cycles. It may also reduce costs since they don't have to create their own silicon or use field-programmable gate arrays (FPGAs) to handle the processing. These designs do all their work in software, which means the speed and availability of server resources will directly impact overall flash performance.
Typically, there are plenty of server CPU resources to go around, but leveraging the host CPU may lead to unpredictable performance under high load conditions. Often, when the server CPU resources are being pushed to their limits, storage I/O traffic is also the greatest. The wrong combination could lead to a momentary, but unexpected, drop in performance.
PCIe vendors that have built their own hardware-based flash management are quick to point to this unpredictable consumption of host resources as a key problem with host-based flash management.
PCIe SSD software
Sharing PCIe SSDs
One of the shortcomings of a PCI Express solid-state drive (PCIe SSD) is that it's exclusive to the server it's installed in. But for many environments, like server virtualization, sharing is required to provide high availability, redundancy, scalability and so forth. There are other more traditional methods of sharing PCIe SSD beyond the emerging flash-only, SAN-less architectures.
The first sharing method is for traditional storage vendors to incorporate PCIe SSD into their storage systems or infrastructures, such as arrays that leverage standard hard drives in conjunction with PCIe SSD. Hot data can take advantage of the faster PCIe SSD storage while cooler data is placed on traditional hard disks.
Some vendors have developed converged architectures that include host processing and storage across several nodes to create an all-in-one server/desktop virtualization offering. The local virtual machine (VM) images are stored in the node hosting each VM. The second copy is spread across the remaining nodes for redundancy and to enable VM migration.
Finally, a number of vendors have created fibre-attached appliances designed to house multiple PCIe SSD cards. Most of these appliances allow sharing of any installed PCIe card; servers can connect to the appliance via a PCIe network, InfiniBand or even Ethernet.
A final PCIe solid-state drive consideration is the software that the vendor provides with the card or as an option. For native PCIe vendors, the device driver itself must be part of this software set. In addition to ensuring that it supports all the operating systems the card will be used with, you must ensure that the driver is compatible with other drivers in your server's boot stack.
Both native and storage protocol-compliant PCIe SSD vendors provide software that enhances the overall use of the board. In some cases the included software can analyze data access on the server and help determine which data would benefit the most by being placed on the SSD.
The PCIe SSD market has been partly responsible for the emergence of new vendors that have developed caching software so that static placement of data can be replaced with dynamic use based on data access activity. This makes the use of PCIe SSDs much simpler and eliminates the need to constantly analyze the environment for the most SSD-appropriate data. Because of the natural fit between PCIe SSDs and caching software, many PCIe SSD manufacturers have acquired a caching software vendor and now bundle the caching app with their hardware.
An interesting wrinkle to the caching market is the emergence of PCIe SSDs with the caching function built into the hardware. This provides the benefit of caching without the need to load additional software or to use server resources to perform the cache analysis.
Some PCIe vendors now provide the ability to mirror flash cards between servers via a high-speed 10 Gigabit Ethernet or InfiniBand network. This can eliminate the need for a SAN altogether. These software applications leverage the fault tolerance capabilities of products like VMware vSphere Fault Tolerance to provide application and data availability in the event of a server failure. This is a trend that should continue to grow in popularity since it provides the local access performance of PCIe SSDs while leveraging the shared resources and redundancy of a traditional SAN.
Summing up PCIe SSDs
All PCIe SSDs aren't created equal. The way the boards are designed and implemented can make a difference for both performance and durability. Also, the emerging software components of the solid-state drive ecosystem (like caching) can significantly impact how fully the PCIe SSD investment is exploited. Prospective PCIe SSD users should look for a board that most simply meets their performance needs at the best price, with some room for performance growth. Performance demands may require advanced features, but often a basic storage-compliant PCIe SSD will yield the best overall value and easiest implementation.
About the author:
George Crump is president of Storage Switzerland, an IT analyst firm focused on storage and virtualization.
This was first published in February 2013