Monitoring Your Node’s Health

The Radix Node provides API endpoints and methods that can be used to monitor its health:

  • /health

  • /validation (specifically for validator nodes)

  • /system

  • /metrics

You can query your own node at these endpoints to get various kinds of data about the the operation of the node. The easiest way to call the endpoints is through the radixnode script (see Installing the RadixNode CLI for more information). You can also call the endpoints directly using an HTTP client such as curl (see Making API Calls), or configure your own automation to check these and notify you of problems.

The /health endpoint

The most basic check on your node’s health is ensuring that it is running and syncing with the network. This is what the /health endpoint is used for. Execute the following command to check the status of your node:

  • radixnode

  • curl

radixnode api health
curl -k -u admin:nginx-password "https://localhost/health"

The call returns a simple status message that is easy to check and monitor, like this:

{
    "status": "UP"
}

The status message will be one of the following codes:

BOOTING

the node is booting and not ready to accept requests

SYNCING

the node is catching up the network

UP

the node is in sync with consensus

STALLED

the node is out of sync and not trying to sync with network, though the network is still available.

OUT_OF_SYNC

the node is out of sync and does not get updates from network (for example, connection to network is lost).

This status is relevant for all node types - full, validator, or archive - since all three must continually sync with the network to function properly.

The /validation endpoint

Validator nodes need to not just sync with the network, but also participate in consensus. The most important aspect of this participation is that the mode correctly make "proposals" of transactions when the protocol requires it. The /validation endpoint inlcudes the validation.get_node_info method to provide information about your validator node and the proposals that it has made to ensure good validator operation.

Execute the following command to make a query to this method:

  • radixnode

  • curl

radixnode api validation get-node-info
curl -d '{
    "jsonrpc": "2.0",
    "method": "validation.get_node_info",
    "params": [],
    "id": 1
}' -H "Content-Type: application/json" -X POST -k -u admin:nginx-password "https://localhost/validation" | python -m json.tool

This should return a result like the following:

{
  "address": "tv41q26vcd92uy2xt8teyqemcs4qf6s6e88g2n9tgflmdnfucurejpceg3360v6",
  "epochInfo": {
      "current": {
          "owner": "tdx1qspl7mgjqwgwqyjvy2tj8swe8a4lr6mxqdhwmn60cujl6a85mqh69eg37p9ph",
          "uptimePercentage": "99.82",
          "proposalsMissed": 2,
          "stakes": [
              {
                  "amount": "2392400000000000000000000",
                  "delegator": "tdx1qspa7g5qq8g9msqt3ggfuf8jaqjn7sc7nh7fzu7ddu4cd0a8shahmpcd9whty"
              }
          ],
          "validatorFee": "1.0",
          "registered": true,
          "totalStake": "2392400000000000000000000",
          "proposalsCompleted": 742
      },
      "updates": {}
  },
  "allowDelegation": true,
  "name": "My Validator",
  "url": "www.my-validator.com"
}

For health monitoring, there are three items to pay particular attention having to do with the tranasction proposals your node has been required to make during the current epoch:

uptimePercentage

The percent of proposals successfully completed

proposalsMissed

The number of proposals missed (typically due to timing out)

proposalsCompleted

The number of proposals completed

The uptimePercentage offered by the /validation endpoint here is different from what is offered at the /archive endpoint’s validators.get_next_epoch_set method. /archive provides an uptime percentage that is an average of proposals successfully made over approximately the past two weeks, and is the value used on the Radix Explorer’s validator list.

If your node is frequently missing proposals, or consistently has an uptimePercentage below 100, your node’s potential emissions will begin to be penalized and you should check its operation.

The /system endpoint

For more detailed information about your node’s network communications, you can use the /system endpoint’s networking.get_data method:

  • radixnode

  • curl

radixnode api system networking-get-data
curl -k -u admin:nginx-password -d '{"jsonrpc": "2.0", "method": "networking.get_data", "params": [], "id": 1}' https://localhost/system

This command should return a result similar to the following:

{
    "messages": {
        "inbound": {
            "processed": 95823,
            "discarded": 0,
            "received": 95823
        },
        "outbound": {
            "processed": 60565,
            "aborted": 0,
            "pending": 0,
            "sent": 60565
        }
    },
    "networking": {
        "udp": {
            "droppedMessages": 0
        },
        "tcp": {
            "outOpened": 0,
            "droppedMessages": 0,
            "closed": 0,
            "inOpened": 0
        },
        "receivedBytes": 182339040,
        "sentBytes": 78212652
    }
}

Of particular interest here are the number of droppedMessages and the messages that are sent, received, and processed as these may help identify general issues with network communication.

The /metrics endpoint

The /metrics endpoint provides a wealth of performance and operational data. While it can be queried directly, it is designed for use with monitoring and alerting dashboards, such as Grafana, and so provides its data in the Prometheus data format.