relay-exporter

Production-oriented Prometheus exporter for monitoring a fixed set of Nostr relays.

What it does

Probes each configured relay either on an interval, or on-demand when /metrics is scraped.
Uses @nostrwatch/nocap for open + read checks.
Optionally performs a low-noise write confirmation check (kind 30078 by default) using deterministic d tags.
Exposes:
- /metrics for Prometheus scraping
- /healthz for process and probe-loop health
Publishes default process metrics via prom-client.

Install

pnpm install

If you are bootstrapping from scratch, these are the exact dependency commands:

pnpm add @nostrwatch/nocap prom-client nostr-tools p-limit
pnpm add -D typescript tsx @types/node

Configuration

Copy and edit:

cp .env.example .env

Example: run probes every 5 minutes instead of on each scrape:

export PROBE_INTERVAL_MS=300000

Environment variables:

RELAYS (required): comma-separated wss:// URLs
PROBE_INTERVAL_MS (default: 0; 0 means run probes on each /metrics scrape)
PROBE_TIMEOUT_MS (default: 10000)
LISTEN_ADDR (default: 0.0.0.0)
PORT (default: 9464)
LOG_LEVEL (default: info; one of debug|info|warn|error)
WRITE_CHECK_ENABLED (default: true)
WRITE_CHECK_KIND (default: 30078)
WRITE_CHECK_PRIVKEY (optional; supports nsec1... or 64-char hex)

Write confirmation key material

WRITE_CHECK_PRIVKEY may be an nsec1... value or a 64-character hex private key.
WRITE_CHECK_PUBKEY is not needed; write-check pubkey is always derived from the private key.
If WRITE_CHECK_PRIVKEY is missing or invalid and write checks are enabled, the exporter generates an ephemeral key for the running process and continues write checks.
Private key values are never logged.

Run

Development with auto-reload:

pnpm dev

Production build:

pnpm build
pnpm start

Run tests:

pnpm test

Exposed metrics

Relay-level labels use {relay} unless stated:

nostr_relay_up (gauge)
nostr_relay_open_ok (gauge)
nostr_relay_read_ok (gauge)
nostr_relay_write_confirm_ok (gauge)
nostr_relay_open_duration_ms (gauge)
nostr_relay_read_duration_ms (gauge)
nostr_relay_write_duration_ms (gauge, -1 when unavailable/disabled)
nostr_relay_last_success_unixtime (gauge)
nostr_relay_probe_errors_total{relay,check} (counter)
nostr_relay_probe_runs_total{relay,result} (counter; result=success|failure)

Also includes all default Node.js process/runtime metrics from prom-client.

Prometheus scrape config example

scrape_configs:
  - job_name: nostr-relay-exporter
    scrape_interval: 15s
    metrics_path: /metrics
    static_configs:
      - targets:
          - "relay-exporter.internal:9464"

Example Grafana queries

Relay up status by relay:
- max by (relay) (nostr_relay_up)
Open latency:
- avg_over_time(nostr_relay_open_duration_ms[5m])
Read latency:
- avg_over_time(nostr_relay_read_duration_ms[5m])
Write confirmation success ratio (15m):
- sum by (relay) (increase(nostr_relay_probe_runs_total{result="success"}[15m])) / sum by (relay) (increase(nostr_relay_probe_runs_total[15m]))
Probe errors by check:
- sum by (relay, check) (increase(nostr_relay_probe_errors_total[15m]))

Health endpoint

GET /healthz returns:
- 200 when process is running and probe data is fresh enough
- 503 when shutting down or probe data is stale/not yet available

Notes

Relay probes are isolated; one relay failure does not block others.
nocap default adapters are explicitly loaded before checks.
Probe concurrency is bounded in code (DEFAULT_PROBE_CONCURRENCY in src/config.ts).
Graceful shutdown handles SIGINT and SIGTERM.

3.8 KiB Raw Blame History