Maintaining High Validator Node Uptime

Introduction

Operating a Radix validator node should be seen as a significant commitment to running a mature, stable, performant piece of server architecture. Only 100 validator nodes are selected each epoch by delegated stake and so Radix’s incentives are designed to encourage staking to the most trusted, performant, and reliable nodes.

In order to produce emissions rewards for both yourself, and your delegators, high uptime is required. This uptime is defined by your node regularly making correct "proposals" when required by the protocol.

If your node fails to promptly make a proposal when requested, the protocol notes the failure and begins reducing the network’s view of your uptime. When uptime drops below 100%, the Radix protocol begins rapidly penalizing emissions – for both you and your delegators. No emissions are received once uptime drops to 98%.

This means that not only must your validator node meet basic hardware requirements, but you must take appropriate measures to ensure that your node is responsive at all times.

Validators, like any other publicly exposed service, can be subject to DDOS[1] attacks. Until the attack is identified and dealt with, a node may be effectively offline and thus not actively participating in the network.

It’s also likely that, from time to time, your node will have to be stopped for routine maintenance.

To avoid downtime in these instances, we highly recommend that all validators use the approach described on this page using a backup node, allowing for quick, safe switching of your single validator ID between two (or more) physical nodes.

In short, you will keep the backup node synchronized with the latest network state, and be ready to shut down your primary validator node and then immediately restart the backup node using the primary’s key and configuration. From the network’s perspective, this will look simply like an orderly notification of the network that your validator node’s IP has changed with minimal interruption.

Here’s how:

1. Set up and sync a backup full node

Your backup full node is set up in the normal way, but should not be registered as a validator. It will operate as a full node until the time when you need to switch over to it as your validator node. When setting up the backup full node, generate a new keyfile that will allow it to operate and sync with the network in the meantime.

Ensure the backup full node is running on a different IP address than the primary validator node.

This full node must be online and synchronized with the rest of the network. This way it will have an up-to-date copy of the ledger that is required to participate in consensus when later used as your validator. Always make sure that the backup full node is running the latest version of the node software, along with your primary validator node.

2. Copy the validator’s key file to your full node

Using whatever tool is preferable to you, copy the key file (called node-keystore.ks by default) of the primary validator node over to the backup full node’s host. The full node’s configuration should point to this new key.

The environment property RADIXDLT_VALIDATOR_KEY_LOCATION specifies the location of the keystore.

3. Set the correct key file password

Since your validator and full node have different key file passwords (at least, they should have!), set the RADIX_NODE_KEYSTORE_PASSWORD environment property to the password of the validator key file that you have moved across.

4. Stop your validator node

If your validator node is still running, you must stop it now.

Make sure you stop the validator node before restarting your backup node. In their current state they are now sharing the same key, which will lead to both nodes fighting for the same IP address and missing consensus rounds.

Two running nodes should never be operating using the same key file.

5. Restart your backup node

Start your node normally. If the configuration is correct, then the node will be a registered validator. Its address should be the same as the original validator because they share the same key. You should also expect an immediate sync, since the ledger is preserved.


1. Distributed Denial Of Service