iSCSI Benchmarking

The following are benchmarks from our testings of our iSCSI SSD storage. 67,300 read IOP/s on a VM on iSCSI (Disk -> LVM -> MDADM -> DRBD -> iSCSI target -> Network -> XenServer iSCSI Client -> VM) Per VM and scales to 1,000,000 IOP/s total root@dev-samm:/mnt/pmt1 128 # fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=128 --size=2G --readwrite=read test: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=128 2.0.8 Starting 1 process bs: 1 (f=1): [R] [55.6% done] [262.1M/0K /s] [67.3K/0 iops] [eta 00m:04s] 38,500 random 4k write IOP/s on a VM on iSCSI (Disk -> LVM -> MDADM -> DRBD -> iSCSI target -> Network -> XenServer iSCSI Client -> VM) Per VM and scales to 700,000 IOP/s total root@dev-samm:/mnt/pmt1 # fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=128 --size=2G --readwrite=randwrite test: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=128 2.0.8 Starting 1 process bs: 1 (f=1): [w] [26.3% done] [0K/150.2M /s] [0 /38.5K iops] [eta 00m:14s] Raw device latency on storage units Intel DC3600 1.2T PCIe NVMe root@s1-san6:/proc # ioping /dev/nvme0n1p1 4.0 KiB from /dev/nvme0n1p1 (device 1.1 TiB): request=1 time=104 us 4.0 KiB from /dev/nvme0n1p1 (device 1.1 TiB): request=2 time=83 us 4.0 KiB from /dev/nvme0n1p1 (device 1.1 TiB): request=3 time=51 us 4.0 KiB from /dev/nvme0n1p1 (device 1.1 TiB): request=4 time=71 us SanDisk SDSSDXPS960G SATA root@pm-san5:/proc # ioping /dev/sdc 4.0 KiB from /dev/sdc (device 894.3 GiB): request=1 time=4.2 ms 4.0 KiB from /dev/sdc (device 894.3 GiB): request=2 time=4.1 ms 4.0 KiB from /dev/sdc (device 894.3 GiB): request=3 time=4.1 ms 4.0 KiB from /dev/sdc (device 894.3 GiB): request=4 time=4.1 ms Micron_M600_MTFDDAK1T0MBF SATA root@pm-san5:/proc # ioping /dev/sdf 4.0 KiB from /dev/sdf (device 953.9 GiB): request=1 time=157 us 4.0 KiB from /dev/sdf (device 953.9 GiB): request=2 time=190 us 4.0 KiB from /dev/sdf (device 953.9 GiB): request=3 time=65 us 4.0 KiB from /dev/sdf (device 953.9 GiB): request=4 time=181 us ```shell ## Latency on the a VM - (Disk -> LVM -> MDADM -> DRBD -> iSCSI target -> Network -> XenServer iSCSI Client -> VM) ```shell root@dev-samm:/mnt 127 # ioping pmt1/ 4096 bytes from pmt1/ (ext4 /dev/xvdb1): request=1 time=0.6 ms 4096 bytes from pmt1/ (ext4 /dev/xvdb1): request=2 time=0.7 ms 4096 bytes from pmt1/ (ext4 /dev/xvdb1): request=3 time=0.7 ms --- pmt1/ (ext4 /dev/xvdb1) ioping statistics --- 3 requests completed in 2159.1 ms, 1508 iops, 5.9 mb/s min/avg/max/mdev = 0.6/0.7/0.7/0.1 ms root@dev-samm:/mnt # ioping pmt2/ 4096 bytes from pmt2/ (ext4 /dev/xvdc1): request=1 time=0.6 ms 4096 bytes from pmt2/ (ext4 /dev/xvdc1): request=2 time=0.8 ms --- pmt2/ (ext4 /dev/xvdc1) ioping statistics --- 2 requests completed in 1658.4 ms, 1470 iops, 5.7 mb/s min/avg/max/mdev = 0.6/0.7/0.8/0.1 ms root@dev-samm:/mnt # ioping pmt3/ 4096 bytes from pmt3/ (ext4 /dev/xvde1): request=1 time=0.6 ms 4096 bytes from pmt3/ (ext4 /dev/xvde1): request=2 time=0.9 ms 4096 bytes from pmt3/ (ext4 /dev/xvde1): request=3 time=0.9 ms ...

July 24, 2015 · 3 min · 456 words · Sam McLeod

Delayed Serial STONITH

A modified version of John Sutton’s rcd_serial cable coupled with our Supermicro reset switch hijacker: This works with the rcd_serial fence agent plugin. Reasons rcd_serial makes for a very good STONITH mechanism: It has no dependency on power state. It has no dependency on network state. It has no dependency on node operational state. It has no dependency on external hardware. It costs less that $5 + time to build. It is incredibly simple and reliable. Essentially the most common STONITH agent type in use is probably those that control UPS / PDUs, while this sounds like a good idea in theory there are a number of issues with relying on a UPS / PDU: ...

July 21, 2015 · 3 min · 450 words · Sam McLeod

Video - Cluster Failover Performance Demo

July 12, 2015 · 0 min · 0 words · Sam McLeod

CentOS 7 and HA

First some background… One of the many lessons I’ve learnt from my Linux HA / Storage clustering project is that the Debian HA ecosystem is essentially broken, We reached the point where packages were too old, too buggy or in Debian 8’s case - outright missing. In the past I was very disappointed with RHEL/CentOS 5 / 6 and (until now) have been quite satisfied with Debian as a stable server distribution with historicity more modern packages and kernels. ...

July 7, 2015 · 3 min · 558 words · Sam McLeod

SSD Storage Cluster - Update and Diagram

Due to several recent events beyond my control I’m a bit behind on the project - hence the lack of updates which I apologise for. The goods news is that I’m back working to finish off the clusters and I’m happy to report that all is going to plan. Here is the final digram of the two-node cluster design: Plain text version available here This was generated from the LCMC tool (beware - it’s java!). ...

June 17, 2015 · 1 min · 79 words · Sam McLeod

Video - Storage Cluster Failover Demo

A brief demonstration of the failover and recovery process on the storage clusters I’ve been building.

May 14, 2015 · 1 min · 16 words · Sam McLeod

Talk - High Performance Software Defined Storage

A high level talk from Infracoders Melbourne on 12/04/2015. There’s also a low quality recording available here: Related posts: Building a high performance SSD SAN - Part 1

April 15, 2015 · 1 min · 28 words · Sam McLeod

Building a high performance SSD SAN - Part 1

Over the coming month I will be architecting, building and testing a modular, high performance SSD-only storage solution. I’ll be documenting my progress / findings along the way and open sourcing all the information as a public guide. With recent price drops and durability improvements in solid state storage now is better time than any to ditch those old magnets. Modular server manufacturers such as SuperMicro have spent large on R&D thanks to the ever growing requirements from cloud vendors that utilise their hardware. ...

February 16, 2015 · 8 min · 1590 words · Sam McLeod

Direct-Attach SSD Storage - Performance & Comparisons

Further to my earlier post on XenServer storage performance with regards to directly attaching storage from the host, I have been analysing the performance of various SSD storage options. I have attached a HP DS2220sb storage blade to an existing server blade and compared performance with 4 and 6 SSD RAID-10 to our existing iSCSI SANs. While the P420i RAID controller in the DS2220sb is clearly saturated and unable to provide throughput much over 1,100MB/s - the IOP/s available to PostgreSQL are still a very considerably performance improvement over our P4530 SAN - in fact, 6 SSD’s result in a 39.9x performance increase! ...

February 15, 2015 · 1 min · 110 words · Sam McLeod

XenServer, SSDs & VM Storage Performance

Intro At Infoxchange we use XenServer as our Virtualisation of choice. There are many reasons for this including: Open Source. Offers greater performance than VMware. Affordability (it’s free unless you purchase support). Proven backend Xen is very reliable. Reliable cross-host migrations of VMs. The XenCentre client, (although having to run in a Windows VM) is quick and simple to use. Upgrades and patches have proven to be more reliable than VMware. OpenStack while interesting, is not yet reliable or streamlined enough for our small team of 4 to implement and manage. XenServer Storage & Filesystems Unfortunately the downside to XenServer is that it’s underlying OS is quite old. The latest version (6.5) about to be released is still based on Centos 5 and still lacks any form of EXT4 and BTRFS support, direct disk access is not available… without some tweaking and has no real support for TRIM unless you have direct disk access and are happy with EXT3. ...

February 15, 2015 · 5 min · 970 words · Sam McLeod