Sam McLeod

Ops Leader, Platform Delivery Engineer, Enabler & Ponderer of Complex Systems

“Easy things should be easy, and hard things should be possible.” - Larry Wall

“Strategy is a commodity, execution is an art.” - Peter Drucker

All Post by Sam McLeod

Broadcom, Or How I Learned To Start Worrying And Drop The Packet

Earlier this week we started the process to upgrade one of our hypervisor compute clusters when we encountered a rather painful bug with HP’s Broadcom NIC chipsets.

We were part way through a rutine rolling pool upgrade of our hypervisor (XenServer) cluster when we observed unexpected and intermittent loss of connectivity between several VMs, then entire XenServer hosts.

The problems appeared to impact hosts that hadn’t yet upgraded to XenServer 7.2. We now attribute this to a symptom of extreme packet loss between the hosts in the pool and thanks to buggy firmware from Broadcom and HP.

We were aware of the recently published issues with Broadcom/HP NICs used in VMware clusters where NICs would be bricked by a firmware upgrade. This issue is different from what we experienced.

We experienced extreme packet loss between hosts in the cluster. With XenServer, the pool master must be upgraded first. The result was that XAPI pool management suffered a communication breakdown across the management network and complicated diagnosis. In fact, the connectivity problems went unnoticed until many hours after the master was upgraded.

At first appeared as if it was a problem caused by the pool being partially upgraded.

We wondered if we had perhaps made a poor decision to run the upgrade on a single node for a few hours to observe its performance. We made the call to upgrade another host and analyse our findings.

The next upgraded hosts appeared stable. In fact we later found this host wasn’t impacted by the bug. We then made the call to upgrade several more nodes and continue to track their stability.

After upgrading half the pool, we suddenly hit problems. VMs failed, Hosts started dropping out of the pool and losing track of the power state of running/stopped VMs.

We found that the master along with one of the other hosts were experiencing major packet loss on their management network cards. We suspected faulty NICs as it wouldn’t be the first time a Broadcom had failed us and there is no physical network cabling.

Broadcom has had its fair share of bad press over the years. Many botched firmware updates and proprietary driver issues. I’m recommending people to stay clear from using network cards based on their chipsets.

Downgrading The Firmware

As soon as we spotted the packet loss on the Broadcom NICs we upgraded their firmware to 2.19.22-1 with no improvement. We then upgraded to 2.18.44-1 / 7.14.62 again with no improvement. We even went as far as trying 2.16.20 / 7.12.83 from back in 2015 - but still no luck.

At the time of writing this no firmware downgrades (or upgrades) have fixed the issue.

The packet loss manifests itself immediately after rebooting or power cycling. But - not on every reboot!. This is the odd thing - approximately half the time when booting a host it is fine until the next boot.

We’ve compared the dmesg, lspci and modinfo output between boot cycles, we can’t find anything that stands out.

The bug seems to be caused by the version of the bnx2x driver present in XenServer 7.2’s Kernel. Upon further reading HP recommends that you use bnx2x driver 7.14.29-2 or later, XenServer still uses the old Kernel version of 4.4.0 - that’s not currently an option.

I suspect that it’s a bug in the Broadcom firmware loaded into the NIC upon boot. I suspect a race condition related to the devices interrupt handling (MSI/MSI-X).

XenServer

XenServer needs to update its kernel or at least the bnx2x driver module past the version that triggers the bug. I’ve logged a ticket for this over at bugs.xenserver.org

Additionally, XenServer didn’t notice the packet loss/network interruptions during the rolling pool upgrade. I have reported this concern and have suggested that XenServer adds pool wide checks for connectivity issues between hosts, at least during a pool upgrade.

Workaround

We don’t have (a good) one.

Currently we’re simply testing for packet loss after boot on the management NIC. If detected we reboot the host and check again. This far from ideal - but until the bug is resolved there isn’t any other fix that we can find short of compiling a custom module for XenServer 7.2.

Given the widespread problems with Broadcom, we’ve ordered HP 560M, Intel based NICs to replace them.

BNX2X Driver

The driver included with XenServer 7.2 that triggers the problem is 1.714.1:

filename:       /lib/modules/4.4.0+10/updates/bnx2x.ko
version:        1.714.1
license:        GPL
description:    QLogic BCM57710/57711/57711E/57712/57712_MF/57800/57800_MF/57810/57810_MF/57840/57840_MF Driver
author:         Eliezer Tamir
srcversion:     927337210F53311B18D0D7E
alias:          pci:v000014E4d0000163Fsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Esv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Dsv*sd*bc*sc*i*
alias:          pci:v00001077d000016ADsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016ADsv*sd*bc*sc*i*
alias:          pci:v00001077d000016A4sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016A4sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016ABsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AFsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016A2sv*sd*bc*sc*i*
alias:          pci:v00001077d000016A1sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016A1sv*sd*bc*sc*i*
alias:          pci:v000014E4d0000168Dsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AEsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000168Esv*sd*bc*sc*i*
alias:          pci:v000014E4d000016A9sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016A5sv*sd*bc*sc*i*
alias:          pci:v000014E4d0000168Asv*sd*bc*sc*i*
alias:          pci:v000014E4d0000166Fsv*sd*bc*sc*i*
alias:          pci:v000014E4d00001663sv*sd*bc*sc*i*
alias:          pci:v000014E4d00001662sv*sd*bc*sc*i*
alias:          pci:v000014E4d00001650sv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Fsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Esv*sd*bc*sc*i*
depends:        mdio,libcrc32c,ptp,vxlan
vermagic:       4.4.0+10 SMP mod_unload modversions
parm:           pri_map: Priority to HW queue mapping (uint)
parm:           num_queues: Set number of queues (default is as a number of CPUs) (int)
parm:           disable_iscsi_ooo: Disable iSCSI OOO support (uint)
parm:           disable_tpa: Disable the TPA (LRO) feature (uint)
parm:           int_mode: Force interrupt mode other than MSI-X (1 INT#x; 2 MSI) (uint)
parm:           dropless_fc: Pause on exhausted host ring (uint)
parm:           poll: Use polling (for debug) (uint)
parm:           mrrs: Force Max Read Req Size (0..3) (for debug) (int)
parm:           debug: Default debug msglevel (uint)
parm:           num_vfs: Number of supported virtual functions (0 means SR-IOV is disabled) (uint)
parm:           autogreeen: Set autoGrEEEn (0:HW default; 1:force on; 2:force off) (uint)
parm:           native_eee:int
parm:           eee:set EEE Tx LPI timer with this value; 0: HW default; -1: Force disable EEE.
parm:           tx_switching: Enable tx-switching (uint)

Whereas XenServer 7.0 has driver version 1.713.04 which seems not to trigger the issue:

filename:       /lib/modules/3.10.0+10/extra/bnx2x.ko
version:        1.713.04
license:        GPL
description:    QLogic BCM57710/57711/57711E/57712/57712_MF/57800/57800_MF/57810/57810_MF/57840/57840_MF Driver
author:         Eliezer Tamir
srcversion:     13EAA521200A40118055D63
alias:          pci:v000014E4d0000163Fsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Esv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Dsv*sd*bc*sc*i*
alias:          pci:v00001077d000016ADsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016ADsv*sd*bc*sc*i*
alias:          pci:v00001077d000016A4sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016A4sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016ABsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AFsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016A2sv*sd*bc*sc*i*
alias:          pci:v00001077d000016A1sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016A1sv*sd*bc*sc*i*
alias:          pci:v000014E4d0000168Dsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AEsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000168Esv*sd*bc*sc*i*
alias:          pci:v000014E4d000016A9sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016A5sv*sd*bc*sc*i*
alias:          pci:v000014E4d0000168Asv*sd*bc*sc*i*
alias:          pci:v000014E4d0000166Fsv*sd*bc*sc*i*
alias:          pci:v000014E4d00001663sv*sd*bc*sc*i*
alias:          pci:v000014E4d00001662sv*sd*bc*sc*i*
alias:          pci:v000014E4d00001650sv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Fsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Esv*sd*bc*sc*i*
depends:        mdio,libcrc32c,ptp
vermagic:       3.10.0+10 SMP mod_unload modversions
parm:           pri_map: Priority to HW queue mapping (uint)
parm:           num_queues: Set number of queues (default is as a number of CPUs) (int)
parm:           disable_iscsi_ooo: Disable iSCSI OOO support (uint)
parm:           disable_tpa: Disable the TPA (LRO) feature (uint)
parm:           int_mode: Force interrupt mode other than MSI-X (1 INT#x; 2 MSI) (uint)
parm:           dropless_fc: Pause on exhausted host ring (uint)
parm:           poll: Use polling (for debug) (uint)
parm:           mrrs: Force Max Read Req Size (0..3) (for debug) (int)
parm:           debug: Default debug msglevel (uint)
parm:           num_vfs: Number of supported virtual functions (0 means SR-IOV is disabled) (uint)
parm:           autogreeen: Set autoGrEEEn (0:HW default; 1:force on; 2:force off) (uint)
parm:           native_eee:int
parm:           eee:set EEE Tx LPI timer with this value; 0: HW default; -1: Force disable EEE.
parm:           tx_switching: Enable tx-switching (uint)

Affected Components

  • HP 530M Network cards (as they use the Broadcom bcm57810 chipset), commonly found in BL460c Gen8 blades and similar.
  • XenServer 7.2 (Patched to the latest XS72E006 patch)
  • Kernel 4.4.0+10 as found in XenServer 7.2
  • Broadcom bnx2x module version 1.714.1
  • HP firmware for qlogic nx2 (seemingly all versions)

Sam McLeod Oct 13, 2017

Return Of The RSS

Of all the tools for reading news and subscribing to software releases, I still find RSS the most useful.

I use Feedly to manage my rss subscriptions and keep all my devices in sync, but instead of using the Feedly’s own client, I use an app called Reeder as the client / reader itself.

Feedly

RSS feed subscription management

Features:

  • Keyword alerts.
  • Browser plugins to subscribe to (current) url.
  • Notation and highlighting support (a bit like Evernote).
  • Search and filtering across large numbers of feeds / content.
  • IFTTT, Zapier, Buffer and Hootsuite integration.
  • Built in save / share functionality (that I only use when I’m on the website).
  • Backup feeds to Dropbox.
  • Very fast, regardless of the fact that I’m in Australia - which often impacts the performance of apps / sites that tend to be hosted on AWS in the US as the latency is so high.
  • Article de-duplication is currently being developed I believe, so I’m looking forward to that!
  • Easy manual import, export and backup (no vendor lock-in is important to me).
  • Public sharing of your Feedly feeds (we’re getting very meta here!).

Reeder

A (really) beautiful and fast iOS / macOS client

  • The client apps aren’t cheap but damn they’re good quality, I much prefer them over the standard Feedly apps.
  • Obviously supports Feedly as a backend but there are many other source services you can use along side each other.
  • I save articles using Reeder’s clip to Evernote functionality… a lot.
  • Sensible default keyboard shortcuts (or at least for me they felt natural YMMV of course).
  • Good customisable ‘share with’ options.
  • Simple, well designed UX.
  • Easy manual import an export just like Feedly.

Sam McLeod Sep 22, 2017

The Aeroplane Flies High, Turns Left, Looks Right and Might Reach Its Destination

A hypothetical scenario in the form of a metaphor

The Airline

There is a Airline with good morals and prides itself on it’s ability to provide services to it’s customers.

The Why

The Airline exists for its strong moral values and believes everyone should have affordable, safe and reliable flights, no matter their income or situation.

The How

The Airline doesn’t want to over-charge their customers or any potential new customers, they care about their customers as much as they do the quality of their services and their ability to provide them.

The What

An Airline wants to expand and needs to make more turnover in order to have a better reach to those in need of their affordable and services.


Finding a Solution - Part 1

The Why

In order to help more customers while keeping in line with the company’s good morals, the Airline can’t increase costs to their much cared about customers, but they need to reach more of those in need because there are many Airports that aren’t on their routes and more people are wanting to use their fantastic service every day.

The How

The airline decides that it can carry more customers on its flights by either decreasing leg room slightly to add additional seating rows to each flight.

It also realises that it doesn’t truly need 8 air staff to do the pre-flight booking, checks and service the cabin as well as the pilots on each flight and realistically they believe they can continue to help their customers without having to charge them more and still provide the same, affordable and safe flights between destinations with 6 air staff and the pilots and still board everyone on time and serve all the customers their drinks and food during the flight to meet their needs.

The airline also decides to carry slightly less fuel to reduce weight as well as saving on petrol emissions per passenger.

The What

The Airline cuts back to the 6 air staff and slightly reduces their fuel load but still well within safe operating limits to their destinations.

The Airline is now able to afford to operate several additional flights and can carry an additional two rows of customers without discomforting them and still providing an excellent service.


Finding a Solution - Part 2

The Why

The Airline is now able to take more people happily each day, but there’s still a lot of destinations they don’t land in and people are asking for their services at those locations.

The How

After careful research the Airline needs additional means to carry more people and reach their destinations and perhaps they can add several additional planes to their fleet.

The What

The Airline invests in larger, more modern planes that are a lot faster, greener and have a much lower failure rate all while offering a wider verity of services to their customers and they’re visiting more airports than previously.


The Real Problem

One day because the Airline is still operating all their planes with the reduced 6 air staff plus pilots, it took slightly longer than usual for one of the air staff to notice there was a customer quite sick near the rear of the plane, one they had noticed it seem there were several people in the area feeling unwell and it looked like it might be something that could spread.

The pilots were informed and made the decision to re-route the flight from the small airport they were destined to, to a larger airport with a hospital nearby where specialists would be awaiting their arrival.

Not far from the new destinations the co-pilot notices that due to the reduced fuel load, there was a warning light showing low fuel pressure, this was an issue as turning back to the original destination would result in a potential fatal situation for those on board but they’d be stretching their fuel to make the new destination safely.

The good news is they made it to the new destination and the sick customers were treated immediately upon arrival, this time.


The Future

The Airline now must make a call, does it risk its financial safety and carry more fuel in the chance that a similar or equally dangerous situation might require re-routing a plane to this Airport again?

Or does the Airline ensure that problems can be noticed faster, perhaps during the boarding process, perhaps by by having additional air staff positioned throughout the plane, not just covering for sick customers but also for detecting and maintaining any other potential problems that could occur or improvements that could be made throughout the plane.

What happens next time?

Sometimes helping our customers requires us to slow down and reflect to first help ourselves before we’re able to safely accept new work and challenges to help our customers.

At times our own needs must outweigh the immediate needs of others and this may mean doing one of the hardest things in the tech industry - saying no to new work or features so that you so that in the near future - you can safely say yes to not only new work, but work of greater scale and complexity.

Sidenote: Perhaps my alternative title could have been ‘or how not to Smash The Pumpkins’.


Sam McLeod - Published 21/03/2017

Sam McLeod Mar 22, 2017

The State of Android in 2016 & The OnePlus 3 Phone

I wanted to try Android for a couple of weeks, I like staying on top of technology, gadgets and making sure I never become a blind ‘zealot’ for any platform or brand.

The OnePlus 3

I did a lot of research and decided to try the “Oneplus 3” as it was good bang-for-buck, ran the latest software had plenty of grunt with the latest 8 core, high clock speed Qualcomm processor coupled with 6GB of DDR4 - the specs really are very impressive, especially for a $400USD phone.

Hardware wise the unit is lovely, not as high build quality as my iPhone 6s+ but not nearly as bad as the Samsung Galaxy S3 or other Samsung devices I had tried in the past.

Hardware

The Good

  • Hardware wise the OP3 felt like it was design with more care than any other Android phone I’d spent any fair amount of time.
  • USB-C connectivity is great to see on a phone, it feels right at home on the device and opens a world of opportunities for accessories assuming the OS supports them.

The Bad

  • The camera felt about as a good as an iPhone 4s which is now quite dated technology.
  • Wireless AC performance was just over 1/2 that of my iPhone 6s+.

The Ugly

  • While I loved that the phone used USB-C I was very disappointed to find that it was actually only USB 2.0.
  • The 4G LTE only gained about 2/3 - 1/2 the performance of my iPhone 6s+ in the same location and on the same network.

The Software

The Good

  • The OS looked pretty, at first.
  • I was please to see that the OP3 used block level encryption by default.
  • The OP3 had SELinux enabled, although I’m not sure what policies were in use.
  • The device was fast and responsive at all times.

The Bad

  • I started installing apps and finding my way around the OS very quickly felt messy and like a chore.
  • I noted that the general ‘quality’ of android apps felt quite poor.
  • Nothing seemed to integrate very well, there was a real feeling that minimal time had been spent on the idea of usage flow on the interface.
  • If at all and a lot of apps felt like they hadn’t been maintained for quite some time despite being the ‘the best option’ available.
  • Notifications seemed to annoy me more often than they did notify me of important things.
  • Battery life was about 1/2 that of my iPhone 6s+, although I was using it quite extensively.

The Ugly

  • The platform was ridding with Google specific apps, and it tries really hard to make you use them whenever possible.
  • Many apps required root access to the phone which is not something I wanted to do, similar applications on iOS did not require ‘jailbreaking’ which I have not done in many years, nor felt the need to.

I had a go at disconnecting my from the clutches of Google first by trying not to sign into any of their applications and disabling all the information reporting in settings, I felt like I was a bit like how iOS was back in the iPhone 3G days when you had to root the phone to do half the things you wanted.

I then ended up flashing a custom rom that was pretty much stock, but without the good stock apps, I had to install the Google App Store and create an account which I think is fair enough because I couldn’t be bothered dropping custom downloaded apks and using that dreadful android transfer tool from my laptop which seems to work 4/5 times you plug your phone in, but like a lot of the apps in the ecosystem - felt very ‘javaish’ (pun slightly intended).

Final Words

The OP3 is a very impressive piece of hardware for $400 USD but I’m left with the feeling that Google’s platform is fragmented and lacking work-flow and I was left with the over all feeling that Google was watching my every move, I don’t know if it’s true or not but I certainly felt like I was Google’s product, not the OS.

I look forward to a usable, non-Android Linux OS image becoming available hopefully in the near future, then it’d be a handy device for tethering with and for mobile security analysis.

Sam McLeod Jul 11, 2016

Update Delayed Serial STONITH Design

note: This is a follow up post from 2015-07-21-rcd-stonith

A Linux Cluster Base STONITH provider for use with modern Pacemaker clusters.

This has since been accepted and merged into Fedora’s code base and as such will make it’s way to RHEL.

  • Source Code: Github
  • [Diptrace] CAD Design: Github
  • Device: https://smcleod.net/rcd-stonith/ (Warning: Contains somewhat outaged images / diagrams now)
  • I have open sourced the CAD circuit design and made this available within this repo under CAD Design and Schematics
  • Related RedHat Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1240868

Many thanks to:

  • George Hansper (Assistance and peer review of electrical design).
  • OurPCB (Board fabrication).
  • Clusterlabs, Redhat and Fedora Teams, feedback and peer review.
  • John Sutton for his original design that served as inspiration.

Sam McLeod Jul 4, 2016

Monitoring SystemD Units With Nagios

Ever forgotten to add a critical service to monitoring?

Want to know if a service or process fails without explicitly monitoring every service on a host?

…Then why not use SystemD’s existing knowledge of all the enabled services? Thanks to ‘Kbyte’ who made a simple Nagios plugin to do just this!

Requirements:

Screenshot:

Sam McLeod May 23, 2016

Online Conversion from SQL_ASCII to UTF8 in PostgreSQL

Credits: George Hansper, Ricardo Vassellini, Evgeny Shebanin, Sam McLeod

Scripts and source available here: sql_ascii_to_utf8

The Goal

To be able to take a Postgres Database which is in SQL_ASCII encoding, and import it into a UTF8 encoded database.

The Problem

Postresql will generate errors like this if it encounters any non-UTF8 byte-sequences during a database restore:

# pg_dump -Fc test_badchar | pg_restore -d test_badchar_utf8
pg_restore: [archiver (db)] Error while PROCESSING TOC:
pg_restore: [archiver (db)] Error from TOC entry 2839; 0 26852 TABLE DATA table101 postgres
pg_restore: [archiver (db)] COPY failed for table "table101": ERROR:  invalid byte sequence for encoding "UTF8": 0x91
CONTEXT:  COPY table101, line 1
WARNING: errors ignored on restore: 1

Sam McLeod May 23, 2016

Speeding Up Rsync

Speeding up Rsync on local (secure) networks

The most common way to use Rsync is probably as such:

rsync -avr [email protected]<source>:<source_dir> <dest_dir>

Resulting in 30-35MB/s depending on file sizes

This can be improved by using a more efficient, less secure encryption algorithm, disabling compression and telling the SSH client to disable some unneeded features that slow things down.

With the settings below I have achieved 100MB/s (at work between VMs) and over 300MB/s at home between SSD drives.

rsync -arv --numeric-ids --progress -e "ssh -T -c [email protected] -o Compression=no -x" [email protected]<source>:<source_dir> <dest_dir>

If you want to delete files at the DST that have been deleted at the SRC (obviously use with caution:

rsync -arv --numeric-ids --progress -e "ssh -T -c [email protected] -o Compression=no -x" [email protected]<source>:<source_dir> <dest_dir> --delete

Points of note:

  1. Because of the weak encryption used, it is not recommended for transferring files across hostile networks (such as the internet).
  2. There are scenarios where enabling compression can improve performance, i.e. if your network link is very slow and your files compress well.
  3. Don’t forget to forward your SSH keys to the host you’re going to run it on! (ssh-agent && ssh-add ((if it’s not already running)) ssh -A [email protected])
  4. If [email protected] isn’t available for you due to using an old operating system, you can use aes128-ctr

Sam McLeod May 3, 2016

Benchmarking IO with FIO

This is a quick tldr; there are many other situations and options you could consider.

  • FIO man page
  • IOP/s = Input or Output operations per second
  • Throughput = How many MB/s can you read/write continuously

Variables worth tuning based on your situation:

--iodepth

The iodepth is very dependant on your hardware.

  • Rotational drives without much cache and high latency (i.e. desktop SATA drives) will not benefit from a large iodepth, Values between 16 to 64 could be sensible.
  • High speed, lower latency SSDs (especially NVMe devices) can utilise a much higher iodepth, Values between 256 to 4096 could be sensible.
--bs

The block size is very dependant on your workload.

  • Writing/Reading lots of small files (i.e. documents, logs) benefit from / represent a smaller block size, Values between 2K - 128K could be sensible and 4k is likely the average in most situations.

  • Writing/Reading large files (i.e. videos, database backups) benefit from / represent a large block size, Values between 2M - 8M could be sensible and 4M is likely the average in most situations.

Before running any of these tests:

  1. Check you’re in a directory with enough free disk space.
  2. Check / pause any other workloads that may interfere with the results.
  3. Understand your workload / what you intend to use the storage for - i.e. what matters?
  4. Tune anything you might want to tune as above such as iodepth or size.

Random write test for IOP/s, i.e. lots of small files

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=256 --size=4G --readwrite=randwrite --ramp_time=4

Random Read test for IOP/s, i.e. lots of small files

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=256 --size=4G --readwrite=randread --ramp_time=4

Sequential write test for IOP/s, i.e. one large file

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=256 --size=4G --readwrite=write --ramp_time=4

Sequential Read test for IOP/s, i.e. one large file

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=256 --size=4G --readwrite=read --ramp_time=4

Random write test for throughput, i.e. lots of small files

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=256 --size=10G --readwrite=randwrite --ramp_time=4

Random Read test for throughput

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=256 --size=10G --readwrite=randread --ramp_time=4

Sequential write test for throughput, i.e. one large file

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=256 --size=10G --readwrite=write --ramp_time=4

Sequential Read test for throughput, i.e. one large file

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4M --iodepth=256 --size=10G --readwrite=read --ramp_time=4

Testing IO latency without fio

[email protected]:/mnt/store1  # ioping . # old backup server
4096 bytes from . (ext4 /dev/mapper/store1-36TB): request=1 time=0.2 ms
4096 bytes from . (ext4 /dev/mapper/store1-36TB): request=2 time=0.1 ms
4096 bytes from . (ext4 /dev/mapper/store1-36TB): request=3 time=0.9 ms

vs

[email protected]:/mnt/store1  # ioping . # new backup server
4 KiB from . (ext4 /dev/md10): request=1 time=88 us
4 KiB from . (ext4 /dev/md10): request=2 time=103 us
4 KiB from . (ext4 /dev/md10): request=3 time=102 us

Sam McLeod Apr 29, 2016

Mirroring a Gitlab project to Github

Let’s pretend you have a project on Gitlab called ask-izzy and you want to mirror it up to Gitlab which is located at https://github.com/ask-izzy/ask-izzy

Assuming you’re running Gitlab as the default user of git and that your repositories are stored in /mnt/repositories you can following something similar to the following instructions:

  1. Grant write access to Github

Get your Gitlab install’s pubkey from the git user

cat /home/git/.ssh/id_rsa.pub

On Github add this pubkey as deploy key on the repo, make sure you tick the option to allow write access.

  1. Add a post-recieve hook to the Gitlab project
mkdir /mnt/repositories/developers/ask-izzy.git/custom_hooks/
echo "exec git push --quiet github &" > \
    /mnt/repositories/developers/ask-izzy.git/custom_hooks/post-receive
chown -R git:git /mnt/repositories/developers/ask-izzy.git/custom_hooks
chmod +x /mnt/repositories/developers/ask-izzy.git/custom_hooks/post-receive
  1. Add Github as a remote to the Gitlab project
cd /mnt/repositories/developers/ask-izzy.git
vi config

and add in the Github remote:

[remote "github"]
  url = [email protected]:ask-izzy/ask-izzy.git
  fetch = +refs/*:refs/*
  mirror = true

Sam McLeod Feb 4, 2016

AskIzzy

Today we launched a mobile website for homeless people

… and it was launched by Australia’s Prime Minister

Today we served up over 87,000 requests to the site and it’s only in it’s first stage of inception.

As many of you know, I work with Infoxchange as the operations lead. When I first heard the idea of a website or app for people that have found or are worried about finding themselves homeless in Australia I really didn’t think it made sense - until I saw the stats showing how many homeless people in Australia have regular access to a smart phone and data either via a cellular provider or free WiFi.

We did a lot of research, working with homeless and at-risk people throughout Australia, it’s really been quite an eye opener especially for my team who are largely technically focused.

AskIzzy is the result of Infoxchange winning the Google Impact Challenge in 2015. For me the most interesting things about the site other than it’s value to those in need is that it didn’t cost tax payers a cent to develop or host and it has no model for making profit of any kind, this resulted in the site being designed truly for the end consumer - the person in need.

Sam McLeod Jan 29, 2016

Fix XenServer SR with corrupt or invalid metadata

If a disk / VDI is orphaned or only partially deleted you’ll notice that under the SR it’s not assigned to any VM. This can cause issues that look like metadata corruption resulting in the inability to migrate VMs or edit storage.

For example:

[[email protected] ~]# xe vdi-destroy uuid=6c2cd848-ac0e-441c-9cd6-9865fca7fe8b
Error code: SR_BACKEND_FAILURE_181
Error parameters: , Error in Metadata volume operation for SR. [opterr=VDI delete operation failed for parameters: /dev/VG_XenStorage-3ae1df17-06ee-7202-eb92-72c266134e16/MGT, 6c2cd848-ac0e-441c-9cd6-9865fca7fe8b. Error: Failed to write file with params [3, 0, 512, 512]. Error: 5],

Removing stale VDIs

To fix this, you need to remove those VDIs from the SR after first deleting the logical volume:

  • Get the LV ID (last number shown above) and find it’s location in /dev:
[[email protected] ~]# lvdisplay | grep 6c2cd848-ac0e-441c-9cd6-9865fca7fe8b
LV Name                /dev/VG_XenStorage-3ae1df17-06ee-7202-eb92-72c266134e16/VHD-6c2cd848-ac0e-441c-9cd6-9865fca7fe8b
  • Remove the logical volume:
[[email protected] ~]# lvremove /dev/VG_XenStorage-3ae1df17-06ee-7202-eb92-72c266134e16/VHD-6c2cd848-ac0e-441c-9cd6-9865fca7fe8b
Logical volume "VHD-6c2cd848-ac0e-441c-9cd6-9865fca7fe8b" successfully removed
  • Destroy the VDI:
[[email protected] ~]# xe vdi-destroy uuid=6c2cd848-ac0e-441c-9cd6-9865fca7fe8b

Regenerate the MGT volume

If this doesn’t work and the SR is still having metadata problems the MGT (management volume) may be corrupt.

Luckily this is easy to rebuild and doesn’t require VMs to be powered off or migrated to another SR.

  • Rescan the SR
xe sr-scan uuid=<SR UUID HERE>
  • Rename the SR’s MGT logical volume (this is safe and does not affect running VMs):
lvcrename /dev/VG_XenStorage-<SR UUID HERE>/MGT /dev/VG_XenStorage-<SR UUID HERE>/oldMGT
  • Rescan the SR

Note: in some cases you might need to do this a couple of times.

xe sr-scan uuid=<SR UUID HERE>
  • Remove any stale VDIs

Look for any VDIs without VMs on the SR in XenCentre or on the cli with:

xe vdi-list sr=<SR UUID HERE>

Remove them with:

[[email protected] ~]# lvdisplay | grep 6c2cd848-ac0e-441c-9cd6-9865fca7fe8b
  LV Name                /dev/VG_XenStorage-3ae1df17-06ee-7202-eb92-72c266134e16/VHD-6c2cd848-ac0e-441c-9cd6-9865fca7fe8b

[[email protected] ~]# lvremove /dev/VG_XenStorage-3ae1df17-06ee-7202-eb92-72c266134e16/VHD-6c2cd848-ac0e-441c-9cd6-9865fca7fe8b
  Logical volume "VHD-6c2cd848-ac0e-441c-9cd6-9865fca7fe8b" successfully removed
  • Rescan the SR
xe sr-scan uuid=<SR UUID HERE>

Sam McLeod Jan 18, 2016

iSCSI SCSI-ID / Serial Persistence

“Having a SCSI ID is a f*cking idiotic thing to do.”

- Linus Torvalds

…and after the amount of time I’ve wasted getting XenServer to play nicely with LIO iSCSI failover I tend to agree.


The Problem

One oddity of Xen / XenServer’s storage subsystem is that it identifies iSCSI storage repositories via a calculated SCSI ID rather than the iSCSI Serial - which would be the sane thing to do.

Citrix’s less than ideal take on dealing with SCSI ID changes is for you to take your VMs offline, disconnected the storage repositories, recreate them, then go through all your VMs and re-attach their orphaned disks hoping that you remembered to add some sort of hint as to what VM they belong to, then finally wipe the sweat and tears from your face.

From CTX11641 - ‘How to Identify If SCSI Storage Repository has Changed SCSI IDs’:

“The SCSI ID of the logical unit number (LUN) changed. When this happened, the iSCSI storage repository became unplugged after a XenServer reboot.” … “To correct the issue you must recreate a PBD with the entry to reflect the right SCSI ID.”

The Solution

A big thank you to Nicholas A. Bellinger from the Kernel SCSI mailing list who helped me a lot in this thread where he explained:

“The Company ID, VSI, and VSIE are generated by LIO based upon the current vpd_unit_serial configfs attribute value. So as long as vpd_unit_serial is persistent, and the same value for backend devices across export failover to different nodes, Xen will always see the same EVPD information.”

An example SCSI ID of 0x6001405bff3f42a49d84cfcb64e2b933 would thus be comprised of:

  • NAA 6, IEEE Company_id: 0x1405
  • Vendor Specific Identifier: 0xbff3f42a4
  • Vendor Specific Identifier Extension: 0x9d84cfcb64e2b933

In addition to the vpd_unit_serial we found that the iblock number must also remain the same between failovers.

/sys/kernel/config/target/core/iblock_0/lun_name/wwn/vpd_unit_serial
/sys/kernel/config/target/core # tree
├── iblock_0                            # Must be consistent between failovers
│   └── iscsi_lun_r2
│      └── wwn
│         └── vpd_unit_serial           # Must be consistent between failovers

If you’re using Corosync / Pacemaker for your target failover the vpd_unit_serial and iblock number must both be set in the iSCSILogicalUnit OCF provider:

Here is an example of a target and lun configured with pcs:

Resource: iscsi_target_r2 (class=ocf provider=heartbeat type=iSCSITarget)
Attributes: iqn=iqn.2003-01.org.linux-iscsi.pm-san.x8664:sn.ca7d7b33c731 portals=10.50.42.75:3260 implementation=lio-t additional_parameters="MaxConnections=100 AuthMethod=None InitialR2T=No MaxOutstandingR2T=64"
Operations: monitor on-fail=restart interval=30s timeout=20s (iscsi_target_r2-monitor-30s)
            start on-fail=restart interval=0 timeout=20s (iscsi_target_r2-start-0)
            stop on-fail=restart interval=0 timeout=20s (iscsi_target_r2-stop-0)
Resource: iscsi_lun_r2 (class=ocf provider=heartbeat type=iSCSILogicalUnit)

Attributes: target_iqn=iqn.2003-01.org.linux-iscsi.pm-san.x8664:sn.ca7d7b33c731 scsi_sn=633c5643 lun=1 lio_iblock=2 path=/dev/drbd2 allowed_initiators="iqn.2015-05.com.example:51e1fb93" implementation=lio-t
Operations: monitor on-fail=restart interval=30s timeout=10s (iscsi_lun_r2-monitor-30s)
            start on-fail=restart interval=0 timeout=20s (iscsi_lun_r2-start-0)
            stop on-fail=restart interval=0 timeout=20s (iscsi_lun_r2-stop-0)
Resource: iscsi_conf_r2 (class=ocf provider=heartbeat type=anything)
Attributes: binfile=/usr/sbin/iscsi_iscsi_conf_r2.sh stop_timeout=3

If you happen to be using Puppet for your Pacemaker configuration it might look a bit like this:

  cs_primitive { "$iscsi_target_primitive":
    primitive_class => 'ocf',
    primitive_type  => 'iSCSITarget',
    provided_by     => 'heartbeat',
    parameters      => {  'iqn'                   => "$iscsi_iqn",
                          'portals'               => "${iscsi_vip}:3260",
                          'implementation'        => 'lio-t',
                          'additional_parameters' => 'MaxConnections=100 AuthMethod=None InitialR2T=No MaxOutstandingR2T=64',
                        },
    operations      =>  {  'monitor'              => { 'timeout' => '20s', 'interval' => '30s','on-fail' => "restart"},
                           'start'                => { 'timeout' => '20s', 'interval' => '0','on-fail'   => "restart"},
                           'stop'                 => { 'timeout' => '20s', 'interval' => '0','on-fail'   => "restart"},
                        },
    require         => [Cs_primitive["$ip_primitive"],Package['targetcli'],Service['pacemaker']],
  }

  cs_primitive { "$iscsi_lun_primitive":
    primitive_class => 'ocf',
    primitive_type  => 'iSCSILogicalUnit',
    provided_by     => 'heartbeat',
    parameters      => {  'target_iqn'            => $iscsi_iqn,
                          'scsi_sn'               => $scsi_sn,
                          'lun'                   => '1',
                          'lio_iblock'            => $lio_iblock,
                          'path'                  => $drbd_path,
                          'allowed_initiators'    => $allowed_initiators,
                          'implementation'        => 'lio-t'},
    operations      => {  'monitor'               => { 'timeout' => '10s', 'interval' => '30s','on-fail' => "restart" },
                          'start'                 => { 'timeout' => '20s', 'interval' => '0'  ,'on-fail' => "restart" },
                          'stop'                  => { 'timeout' => '20s', 'interval' => '0'  ,'on-fail' => "restart" },
                        },
    require         => [Cs_primitive["$iscsi_target_primitive"],Service['pacemaker']],
  }

Sam McLeod Dec 14, 2015

Join us on our mission of 'Technology for Social Justice'

Seeking a Linux Systems Engineer to join our Ops team in a fast moving DevOps environment.

Infoxchange is a not-for-profit organisation that delivers technology for social justice, we work to strengthen communities and organisations, using information technology as the primary tool to create positive social change.

In this position you will be working with team members and developers to design and support continuous application delivery, performance, scale and automation.

We’re looking for someone that is forward-thinking, passionate and reliable, who is always at the forefront of new technologies.

Our Systems (ops) team are hard working, respected promoters of positive change and continuous improvement within the organisation and are each highly technically skilled.

The roll is a full time, permanent position.

The position will involve working with technologies such as:

  • Puppet
  • Docker and related orchestration frameworks
  • Server and database clustering
  • CI/CD pipelines
  • Gitlab, PostgreSQL, Nginx, XenServer, Elasticsearch, Logstash and more

You can often find the team presenting at meetups such as Infrastructure Coders, Puppet Camp, DevOps Melbourne, DevopsDays, LCA, etc.

Sam McLeod Oct 8, 2015

Replacing Junos Pulse with OpenConnect

In an attempt to avoid using the Juniper Pulse (Now Pulse Secure) VPN client we tried OpenConnect but found that DNS did not work correctly when connected to the VPN. This bug has now been resolved recently but has not made it’s way into a new build, in fact there have been no releases for 6 months.

Luckily the OpenConnect was not too difficult to build from source.

Sam McLeod Sep 22, 2015

SSD Storage - Two Months In Production

Over the last two months I’ve been runing selected IO intensive servers off the the SSD storage cluster, these hosts include (among others) our:

  • Primary Puppetmaster
  • Gitlab server
  • Redmine app and database servers
  • Nagios servers
  • Several Docker database host servers

Sam McLeod Sep 13, 2015

OS X Software Update Channels For Betas

Set update channel to receive developer beta update

sudo softwareupdate --set-catalog https://swscan.apple.com/content/catalogs/others/index-10.11seed-10.11-10.10-10.9-mountainlion-lion-snowleopard-leopard.merged-1.sucatalog.gz

Set update channel to receive public beta update

sudo softwareupdate --set-catalog https://swscan.apple.com/content/catalogs/others/index-10.11beta-10.11-10.10-10.9-mountainlion-lion-snowleopard-leopard.merged-1.sucatalog.gz

List available updates

sudo softwareupdate –list

Set update channel to receive default, stable updates

sudo softwareupdate --clear-catalog

Show current settings

defaults read /Library/Preferences/com.apple.SoftwareUpdate.plist

Write setting manually

defaults write /Library/Preferences/com.apple.SoftwareUpdate CatalogURL https://swscan.apple.com/content/catalogs/others/index-10.11beta-10.11-10.10-10.9-mountainlion-lion-snowleopard-leopard.merged-1.sucatalog.gz

Sam McLeod Sep 1, 2015

iSCSI Benchmarking

67,300 read IOP/s on a VM on iSCSI

  • (Disk -> LVM -> MDADM -> DRBD -> iSCSI target -> Network -> XenServer iSCSI Client -> VM)
  • Per VM and scales to 1,000,000 IOP/s total
[email protected]:/mnt/pmt1 128 # fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=128 --size=2G --readwrite=read
test: (g=0): rw=read, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=128
2.0.8
Starting 1 process
bs: 1 (f=1): [R] [55.6% done] [262.1M/0K /s] [67.3K/0  iops] [eta 00m:04s]

38,500 random 4k write IOP/s on a VM on iSCSI

  • (Disk -> LVM -> MDADM -> DRBD -> iSCSI target -> Network -> XenServer iSCSI Client -> VM)
  • Per VM and scales to 700,000 IOP/s total
[email protected]:/mnt/pmt1 # fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=128 --size=2G --readwrite=randwrite
test: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=128
2.0.8
Starting 1 process
bs: 1 (f=1): [w] [26.3% done] [0K/150.2M /s] [0 /38.5K iops] [eta 00m:14s]

Sam McLeod Jul 24, 2015

Delayed Serial STONITH

A modified version of John Sutton’s rcd_serial cable coupled with our Supermicro reset switch hijacker:

This works with the rcd_serial fence agent plugin.

Reasons rcd_serial makes for a very good STONITH mechanism:

  • It has no dependency on power state.
  • It has no dependency on network state.
  • It has no dependency on node operational state.
  • It has no dependency on external hardware.
  • It costs less that $5 + time to build.
  • It is incredibly simple and reliable.

Essentially the most common STONITH agent type in use is probably those that control UPS / PDUs, while this sounds like a good idea in theory there are a number of issues with relying on a UPS / PDU:

  • Units that have remote power control over individual outlets are very expensive and if an upgrade is undertaken a rake-wide outage may be required depending on the existing infrastructure.
  • Often these units are managed via the network, requiring the network and all that that entails to be functioning as expected. It also may require an additional NIC that may or may not fit into your storage units.
  • There are almost always two PDUs / UPSs to manage, until very recently the PDU STONITH agents only supported sending an action to a single unit, while they now support sending them to two units in modern packages there a number of situations that are complex to manage and predict - i.e. what if one unit responds, cuts the power and the other doesn’t? Who’s in charge? Do we fail over? etc… that’s a LOT of logic for a STONITH action.
  • I’ve seen several PDUs fail, it’s not pretty and often the management interface is the first thing to go.

Adam Coy’s slightly modified version of the circuit that includes an indicator LED and an optocoupler:

Example of where our Supermicro reset hijack connects on the target node:

Availability

At present the rcd_serial STONITH agent is available as part of the cluster-glue package, cluster-glue is not available in RHEL/CentOS but can be obtained from OpenSUSE’s CentOS7 Repo or my own mirror. (Tested with CentOS 7).

I have an open ticket with RedHat regarding the fact that their pacemaker rpm is built without the --with stonithd flag which allows this to work without with their version of Pacemaker.

The long term solution is to get rcd_serial migrated to the new Pacemaker agent API (or is it this one?) system.

Sam McLeod Jul 21, 2015

CentOS 7 and HA

First some background…

One of the many lessons I’ve learnt from my Linux HA / Storage clustering project is that the Debian HA ecosystem is essentially broken, We reached the point where packages were too old, too buggy or in Debian 8’s case - outright missing.

In the past I was very disappointed with RHEL/CentOS 5 / 6 and (until now) have been quite satisfied with Debian as a stable server distribution with historicly more modern packages and kernels.

I feel that CentOS / RHEL 7 has changed the game.*

(When combined with ElRepo or EPEL that provide wide array of modern packages)

It is simply light years ahead of it’s predecessor, resolves a lot of the issues we’ve had with Debian Jessie and after thorough testing of Debian 7,8 Centos 6,7 I have decided to employ CentOS 7 as the base OS for the storage cluster. CentOS / RHEL 7 are missing a few critical packages in the cluster HA space

  1. The available pacemaker package was built without --with-stonith. This means there is no support for legacy STONITH plugins, many of these are heavily relied upon and do not have replacements in the new plugin system.

  2. cluster-glue is missing / has been deprecated. The cluster-glue package provides a lot of very useful resource agents / plugins which again, are missing from the new pacemaker builds.

  3. crmsh is no longer available and has been replaced by pcs. pcs is great, but lots of very useful tools still use crmsh including the puppet-corosync module which is fantastic for bootstrapping clusters.

Short term solution:

Pacemaker

I have recompiled CentOS 7’s pacemaker package changing only one thing - I added --with-stonith to the rpmbuild command.

Cluster-Glue

OpenSUSE provides the cluster-glue and cluster-glue-libs from their CentOS 7 repository

CRMSH

OpenSUSE also provides crmsh from their [CentOS 7 repository]((http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/)

Long term solution:

Update STONITH / Fencing agents

Obviously the best solution however have a feeling that it will take some time for all the maintainers of the existing STONITH / fencing agent developers to port their code to the new framework.

Request package to be built –with stonithd

Host required packages

I have been logging requests to various third party RHEL repos asking if they would the missing packages.

If they don’t feel they have the resources or want to add them in for whatever reason, I will continue to host the packages myself on Packagecloud and likely somewhere else as a fall-back.

Last updated: Tue Jul  7 11:59:25 2015
Last change: Tue Jul  7 10:57:16 2015
Stack: corosync
Current DC: s1-san5 (1) - partition with quorum
Version: 1.1.12-a14efad
2 Nodes configured
16 Resources configured


Online: [ s1-san5 s1-san6 ]

 Clone Set: ping_gateway-clone [ping_gateway]
     Started: [ s1-san5 s1-san6 ]
 Master/Slave Set: ms_drbd_r0 [drbd_r0]
     Masters: [ s1-san5 ]
     Slaves: [ s1-san6 ]
 ip_r0  (ocf::heartbeat:IPaddr2): Started s1-san5
 iscsi_target_r0  (ocf::heartbeat:iSCSITarget): Started s1-san5
 iscsi_lun_r0 (ocf::heartbeat:iSCSILogicalUnit):  Started s1-san5
 stonith_s1-san5  (stonith:rcd_serial): Started s1-san6
 stonith_s1-san6  (stonith:rcd_serial): Started s1-san5
 Master/Slave Set: ms_drbd_r1 [drbd_r1]
     Masters: [ s1-san5 ]
     Slaves: [ s1-san6 ]
 ip_r1  (ocf::heartbeat:IPaddr2): Started s1-san5
 iscsi_target_r1  (ocf::heartbeat:iSCSITarget): Started s1-san5
 iscsi_lun_r1 (ocf::heartbeat:iSCSILogicalUnit):  Started s1-san5
 iscsi_conf_r1  (ocf::heartbeat:anything):  Started s1-san5
 iscsi_conf_r0  (ocf::heartbeat:anything):  Started s1-san5

Sam McLeod Jul 7, 2015

SSD Storage Cluster - Update and Diagram

Due to several recent events beyond my control I’m a bit behind on the project - hence the lack of updates which I appologise for. The goods news is that I’m back working to finish off the clusters and I’m happy to report that all is going to plan.

Here is the final digram of the two-node cluster design:

Plain text version available here

This was generated from the LCMC tool (beware - it’s java!).

More on this soon…

Sam McLeod Jun 17, 2015

Xen Orchestra Docker Image

Docker config to setup XO which is a web interface to visualize and administrate your XenServer (or XAPI enabled) hosts

Github: sammcj/docker-xen-orchestra

Running the app

Updates are pushed to the Docker Hub’s automated build service:

  • https://registry.hub.docker.com/u/sammcj/docker-xen-orchestra

Sam McLeod May 26, 2015

Building a high performance SSD SAN - Part 1

Over the coming month I will be architecting, building and testing a modular, high performance SSD-only storage solution.

I’ll be documenting my progress / findings along the way and open sourcing all the information as a public guide.

With recent price drops and durability improvements in solid state storage now is better time than any to ditch those old magnets.

Modular server manufacturers such as SuperMicro have spent large on R&D thanks to the ever growing requirements from cloud vendors that utilise their hardware.

Sam McLeod Feb 16, 2015

XenServer, SSDs & VM Storage Performance

Intro

At Infoxchange we use XenServer as our Virtualisation of choice. There are many reasons for this including:

  • Open Source.
  • Offers greater performance than VMware.
  • Affordability (it’s free unless you purchase support).
  • Proven backend Xen is very reliable.
  • Reliable cross-host migrations of VMs.
  • The XenCentre client, (although having to run in a Windows VM) is quick and simple to use.
  • Upgrades and patches have proven to be more reliable than VMware.
  • OpenStack while interesting, is not yet reliable or streamlined enough for our small team of 4 to implement and manage.
  • XenServer Storage & Filesystems

Sam McLeod Feb 14, 2015

The Best Of - 2014 Edition

At the end of every year I note down a summary of the best applications, hardware & websites I’ve enjoyed & depended on throughout the year (and often for some time before).

Software / General Use:

  • Fastmail - https://www.fastmail.com
  • Evernote - https://evernote.com
  • Reeder - http://reederapp.com
  • Keynote - https://www.apple.com/au/mac/keynote
  • Lastpass - https://lastpass.com
  • Plex - https://plex.tv
  • Calibre - http://calibre-ebook.com

Sam McLeod Jan 1, 2015

Talk - 24 Months

The way we work at Infoxchange has changed greatly.

A retrospective journey into transforming Infoxchange’s technology and culture over the past 24 months - presented a Melbourne DevOps – December 2014

Click to Start Slides

Sam McLeod Oct 15, 2014

Direct-Attach SSD Storage – Performance & Comparisons

Further to my earlier post on XenServer storage performance with regards to directly attaching storage from the host, I have been analysing the performance of various SSD storage options. I have attached a HP DS2220sb storage blade to an existing server blade and compared performance with 4 and 6 SSD RAID-10 to our existing iSCSI SANs.

While the P420i RAID controller in the DS2220sb is clearly saturated and unable to provide throughput much over 1,100MB/s – the IOP/s available to PostgreSQL are still a very considerably performance improvement over our P4530 SAN – in fact, 6 SSD’s result in a 39.9x performance increase!

Click the image below for the results:

Click to Start Slides

Sam McLeod Oct 15, 2014