Monitoring Your Node’s Health

The Radix Node provides the System API that can be used to monitor its health.

You can query your own node at these endpoints to get various kinds of data about the the operation of the node. The easiest way to call the endpoints is through the radixnode script (see Installing the RadixNode CLI for more information). You can also call the endpoints directly using an HTTP client such as curl (see Making API Calls), or configure your own automation to check these and notify you of problems.

The /health endpoint

The most basic check on your node’s health is ensuring that it is running and syncing with the network. This is what the /health endpoint is used for. Execute the following command to check the status of your node:

  • radixnode

  • curl

radixnode api system health
curl -k -u admin:nginx-password "https://localhost/system/health"

The call returns a simple status message that is easy to check and monitor, like this:

{
    "status": "UP"
}

The status message will be one of the following codes:

BOOTING

the node is booting and not ready to accept requests

SYNCING

the node is catching up the network

UP

the node is in sync with consensus

STALLED

the node is out of sync and not trying to sync with network, though the network is still available.

OUT_OF_SYNC

the node is out of sync and does not get updates from network (for example, connection to network is lost).

This status is relevant for all node types - full, validator, or archive - since all three must continually sync with the network to function properly.

Checking your validator’s status

As of version 1.1.0, the /validation endpoint functionality has moved to a special case of the /entity endpoint on the Core API

Validator nodes need to not just sync with the network, but also participate in consensus. The most important aspect of this participation is that the mode correctly make "proposals" of transactions when the protocol requires it.

You can use the /entity endpoint to provide information about your validator node and the proposals that it has made to ensure good validator operation.

Execute the following command to make a query to this method:

  • radixnode

  • curl

radixnode api core entity -v -sy
curl -k -u admin:{nginx-admin-password} -X POST 'https://localhost/entity' \
--header 'Content-Type: application/json' \
--data-raw '{
    "network_identifier": {
        "network": "mainnet"
    },
    "entity_identifier": {
        "address": "<node's validator address>",
        "sub_entity": {
            "address": "system"
         }
    }
}'

This should return a result like the following:

{
    "state_identifier": {
        "state_version": 81000819,
        "transaction_accumulator": "427c57e2ddcffe91f0e4a7ac937f9f41e6d65d4f3355f8b2b7536578c7115950"
    },
    "balances": [
        {
            "value": "23000851284300000000000000",
            "resource_identifier": {
                "rri": "xrd_tr71qy30nghm",
                "type": "Token"
            }
        }
    ],
    "data_objects": [
        {
            "owner": {
                "address": "rdx71qspgznk3etjey20a49hmwfsejjs23xy28l62x8nn8h75ysrc3jvfppghnrasm"
            },
            "registered": true,
            "fee": 0,
            "type": "ValidatorData"
        },
        {
            "proposals_completed": 871,
            "proposals_missed": 0,
            "type": "ValidatorBFTData"
        }
    ]
}

For health monitoring, there are three items to pay particular attention having to do with the tranasction proposals your node has been required to make during the current epoch:

uptimePercentage

The percent of proposals successfully completed. It can be calculated by proposalsCompleted/(proposalsMissed+proposalsCompleted)

proposalsMissed

The number of proposals missed (typically due to timing out)

proposalsCompleted

The number of proposals completed

If your node is frequently missing proposals, or consistently has an uptimePercentage below 100, your node’s potential emissions will begin to be penalized and you should check its operation.

The /system endpoint

For more detailed information about your node’s network communications, you can use the /system/metrics endpoint:

  • radixnode

  • curl

radixnode api system metrics
curl -k -u admin:nginx-password  https://localhost/system/metrics

This command should return a result similar to the following:

{
    "bft": {
        "events_received": 0,
        "committed_vertices": 0,
        "no_votes_sent": 0,
        "vote_quorums": 0,
        "timeout_quorums": 0,
        "pacemaker": {
            "timeouts_sent": 0,
            "round": 0,
            "proposed_transactions": 0,
            "proposals_sent": 0,
            "timed_out_rounds": 0
        },
        "sync": {
            "requests_sent": 0,
            "requests_received": 0,
            "request_timeouts": 0
        },
        "vertex_store": {
            "size": 0,
            "forks": 0,
            "rebuilds": 0,
            "indirect_parents": 0
        }
    },
    "sync": {
        "current_state_version": 80417941,
        "target_state_version": 80417943,
        "valid_responses_received": 80424941,
        "invalid_responses_received": 0,
        "remote_requests_received": 0
    },
    "mempool": {
        "current_size": 0,
        "add_success": 2482,
        "add_failure": 42,
        "relays_sent": 26272
    },
    "networking": {
        "bytes_sent": 24329267672,
        "bytes_received": 53346865335,
        "inbound": {
            "processed": 35681668,
            "discarded": 0,
            "received": 35681668
        },
        "outbound": {
            "processed": 20512191,
            "aborted": 0,
            "pending": 0,
            "sent": 20512191
        }
    }
}

The /prometheus/metrics endpoint

The /metrics endpoint provides a wealth of performance and operational data. While it can be queried directly, it is designed for use with monitoring and alerting dashboards, such as Grafana, and so provides its data in the Prometheus data format.