Increasing Data Processing and Transmission Speed for NVMe Storage Devices

Denis Petronenko


By Denis Petronenko

Head of the Telecom Unit at Promwad 

Project in Nutshell: We improved the performance of our client's storage system by 30% by connecting DPDK libraries with the ZFS file system on NVMe drives. This performance increase can provide a key competitive advantage for the storage market, so this case study details the engineering aspect of this benefit. 

Client & Challenge 

Working with an NVMe-based storage vendor, we aimed to select best practices for designing NAS that can improve performance and data transfer rates.


1. Concept Development

To evaluate the speed of data processing and recording, we tested NVMe drives combined in a RAID array using ZFS (Zettabyte File System). ZFS by itself allows increasing the data recording speed by several times. We also planned testing with the DPDK library set to evaluate the possibility of even greater performance improvement. 

2. Hardware platform description 

    The following components formed the hardware platform of our testing: 

    • 4 high-capacity NVMe drives from Samsung, the MZQL27T6HBLA-00A07 model.
    • A board for connecting four NVMe drives. 
    • PC system model: Ampere Altra Developer Platform. 
    • PC RAM: 6x32 (192GiB) DIMM DDR4 2933 MHz. 
    • PC CPU: ARMv8 by ARM (Q64-22). 
    • PC motherboard: COM-HPC Ampere Altra Module. 

    3. Software platform description 

      We used the following software components to perform the testing: 

      • OS Ubuntu Server version 22.04.4 LTS with ARM 64 architecture.
      • Zpool utility for creating, configuring, and optimising a ZFS RAID array.
      • TestBench software for performance evaluation. This software simulates data transfer operations at 16Gbps per data unit and measures copy speeds.
      • DPDK 23.07 library set to improve performance during testing. 

      4. Description of testing methodology 

        We performed NVMe testing by applying four configuration settings: 

        1. Benchmarking the default ZFS pool settings. This configuration serves as a baseline test to measure the performance of the ZFS pool using its default settings. It includes enabling LZ4 compression, which can help save storage space by compressing data. 
        2. Benchmarking the ZFS pool with recommended settings for NVMe disks. NVMe disks are high-performance storage devices, so this configuration optimises the ZFS pool for NVMe by setting the physical block size to 4K sectors (common for modern NVMe disks) and adjusting the record size to 1 megabyte for efficient sequential input/output operations. 
        3. Benchmarking the ZFS pool with specific tuning options and "direct IO." This configuration focuses on fine-tuning the ZFS pool for maximum performance and efficiency. It includes turning off compression and deduplication to reduce processing overhead, disabling access time updates (atime), optimising extended attributes storage (xattr=sa), disabling checksums, and setting the log bias for throughput optimisation. 
        4. Benchmarking multiple ZFS pools, each consisting of one NVMe disk. In this configuration, the user creates four separate ZFS pools, each with one NVMe disk. This allows them to evaluate the performance of individual disks and pools. The tuning options and "direct IO" settings from Configuration 3 are applied to each pool to ensure consistency in testing conditions.

        The testing was conducted as follows: 

        • The system was configured according to the configuration settings (see configurations #1, 2, 3 or 4 in the table below).
        • We generated a data stream using the Testbench programme and measured the data write speed on RAID.
        • We re-run the same tests but with the additional Lz4 data compression option for zpool. 

        We repeated testing of each configuration three times on 4, 8 and 16 data processing threads. The results are shown in the table below, including the data transmission speed with the additional compression option:

        Configuration No.

        4 threads (1x16GB per thread), Mbps

        8 threads (1x16GB per thread), Mbps

        16 threads (1x16GB per thread), Mbps

        Configuration 1 







        Configuration 2 







        Configuration 3 







        Configuration 4 







        5. Results

          The findings allow us to draw the following conclusions: 

          • The write speed increases with the number of threads: if more tracks are used to process data simultaneously, the system can write data to the ZFS pool faster. System performance depends on the degree of parallelism in data processing.
          • Optimal performance is achieved by connecting four NVMe devices to the system and configuring them as four separate ZFS pools. This configuration increases the system's ability to parallelise data processing on multiple discs and improves write performance compared to other configurations.
          • Data compression using the lz4 algorithm for a ZFS pool can situationally increase or slightly decrease write speed. The impact on write speed may depend on the nature of the data being compressed and the overall system workload.

          Bussines Value 

          We helped our client develop a storage system that improved performance by 30% over similar products on the market.

          With the upgraded NAS system, our client will be able to expand its market reach and attract more customers from different industries where high write and read speeds in storage systems are critical: 

          • finance and e-commerce, where performance improvements and data processing speeds enable more efficient transaction processing, improved data analytics and compliance;
          • cloud services provider, where an efficient storage infrastructure will deliver high-performance computing resources, improve data security and scale services to meet growing customer needs;
          • streaming media, where optimised storage infrastructure will reduce latency and ensure seamless streaming operations.

          More of What We Do to Improve Performance

          • SPDK и DPDK Solutions: explore our expertise in high-speed data transmission designs – up to 10 times faster than the Linux kernel network stack. 
          • Enterprise NAS: a case study on the design of an enterprise network attached storage system with DPDK/SPDK support. 
          • TCP PEP and QoS Software Modules: learn how we developed software to improve performance in communication systems via geostationary satellites. 

          Other Case Studies

          Tell us about your project!

          All submitted information will be kept confidential.