Reducing Validator Downtime

Introduction

Validators, like any other publicly exposed service, can be subject to DDOS[1] attacks. Until the attack is identified and dealt with, the node is offline and thus not actively participating in the network.

It’s also likely that, from time to time, your node will have to be stopped for routine maintenance.

To avoid such downtime, we recommend the following approach:

1. Ensure you have a backup full node.

Always run a separate full node on a different IP address. This full node must be online and synchronized with the rest of the network. This way it will have an up-to-date copy of the ledger.

2. Stop the backup node when the validator node is down (or under attack).

Once your validator is under attack or generally unavailable (e.g., down for maintenance) then first ensure that your full node is synced before stopping it and making a backup of its key.

3. Copy the validator key to your full node

Using whatever tool is preferable to you, copy the validator’s key over to the full node’s host. The full node’s configuration should point to this new key.

The environment property RADIXDLT_VALIDATOR_KEY_LOCATION specifies the location of the keystore.

4. Set the correct keystore password

Since your validator and fullnode have different keystore passwords (at least, they should have!), use the RADIX_NODE_KEYSTORE_PASSWORD environment property to set the password of the new key.

5. Stop your validator node

If your validator node is still running, you must stop it now.

Make sure you stop the validator node before restarting your backup node. In their current state they are now sharing the same key, which will lead to both nodes fighting for the same IP address and missing consensus rounds.

Two running nodes should never share the same key.

6. Restart your backup node

Start your node normally. If the configuration is correct, then the node will be a registered validator. It’s address should be the same as the original validator because they share the same key. You should also expect an immediate sync, since the ledger is preserved.


1. Distributed Denial Of Service