Delayed Serial STONITH
A modified version of John Sutton’s rcd_serial cable coupled with our Supermicro reset switch hijacker:
This works with the rcd_serial fence agent plugin.
rcd_serial makes for a very good STONITH mechanism:
- It has no dependency on power state.
- It has no dependency on network state.
- It has no dependency on node operational state.
- It has no dependency on external hardware.
- It costs less that $5 + time to build.
- It is incredibly simple and reliable.
Essentially the most common STONITH agent type in use is probably those that control UPS / PDUs, while this sounds like a good idea in theory there are a number of issues with relying on a UPS / PDU:
- Units that have remote power control over individual outlets are very expensive and if an upgrade is undertaken a rake-wide outage may be required depending on the existing infrastructure.
- Often these units are managed via the network, requiring the network and all that that entails to be functioning as expected. It also may require an additional NIC that may or may not fit into your storage units.
- There are almost always two PDUs / UPSs to manage, until very recently the PDU STONITH agents only supported sending an action to a single unit, while they now support sending them to two units in modern packages there a number of situations that are complex to manage and predict - i.e. what if one unit responds, cuts the power and the other doesn’t? Who’s in charge? Do we fail over? etc… that’s a LOT of logic for a STONITH action.
- I’ve seen several PDUs fail, it’s not pretty and often the management interface is the first thing to go.
Note: The following was v1 of the devices design, a post on the update v2/v3 design can be found here
Example of where our Supermicro reset hijack connects on the target node:
Image of the rear of a two node cluster with networking and rcd_serial STONITH (v1) connectors attached:
UPDATE: This has since been added to RHEL/CentOS and the official Clusterlabs resource agents
At present the rcd_serial STONITH agent is available as part of the
cluster-glue is not available in RHEL/CentOS but can be obtained from OpenSUSE’s CentOS7 Repo or my own mirror. (Tested with CentOS 7).
I have an open ticket with RedHat regarding the fact that their pacemaker rpm is built without the
--with stonithd flag which allows this to work without with their version of Pacemaker.