3.8 KiB
3.8 KiB
relay-exporter
Production-oriented Prometheus exporter for monitoring a fixed set of Nostr relays.
What it does
- Probes each configured relay either on an interval, or on-demand when
/metricsis scraped. - Uses
@nostrwatch/nocapforopen+readchecks. - Optionally performs a low-noise write confirmation check (kind
30078by default) using deterministicdtags. - Exposes:
/metricsfor Prometheus scraping/healthzfor process and probe-loop health
- Publishes default process metrics via
prom-client.
Install
pnpm install
If you are bootstrapping from scratch, these are the exact dependency commands:
pnpm add @nostrwatch/nocap prom-client nostr-tools p-limit
pnpm add -D typescript tsx @types/node
Configuration
Copy and edit:
cp .env.example .env
Example: run probes every 5 minutes instead of on each scrape:
export PROBE_INTERVAL_MS=300000
Environment variables:
RELAYS(required): comma-separatedwss://URLsPROBE_INTERVAL_MS(default:0;0means run probes on each/metricsscrape)PROBE_TIMEOUT_MS(default:10000)LISTEN_ADDR(default:0.0.0.0)PORT(default:9464)LOG_LEVEL(default:info; one ofdebug|info|warn|error)WRITE_CHECK_ENABLED(default:true)WRITE_CHECK_KIND(default:30078)WRITE_CHECK_PRIVKEY(optional; supportsnsec1...or 64-char hex)
Write confirmation key material
WRITE_CHECK_PRIVKEYmay be annsec1...value or a 64-character hex private key.WRITE_CHECK_PUBKEYis not needed; write-check pubkey is always derived from the private key.- If
WRITE_CHECK_PRIVKEYis missing or invalid and write checks are enabled, the exporter generates an ephemeral key for the running process and continues write checks. - Private key values are never logged.
Run
Development with auto-reload:
pnpm dev
Production build:
pnpm build
pnpm start
Run tests:
pnpm test
Exposed metrics
Relay-level labels use {relay} unless stated:
nostr_relay_up(gauge)nostr_relay_open_ok(gauge)nostr_relay_read_ok(gauge)nostr_relay_write_confirm_ok(gauge)nostr_relay_open_duration_ms(gauge)nostr_relay_read_duration_ms(gauge)nostr_relay_write_duration_ms(gauge,-1when unavailable/disabled)nostr_relay_last_success_unixtime(gauge)nostr_relay_probe_errors_total{relay,check}(counter)nostr_relay_probe_runs_total{relay,result}(counter;result=success|failure)
Also includes all default Node.js process/runtime metrics from prom-client.
Prometheus scrape config example
scrape_configs:
- job_name: nostr-relay-exporter
scrape_interval: 15s
metrics_path: /metrics
static_configs:
- targets:
- "relay-exporter.internal:9464"
Example Grafana queries
- Relay up status by relay:
max by (relay) (nostr_relay_up)
- Open latency:
avg_over_time(nostr_relay_open_duration_ms[5m])
- Read latency:
avg_over_time(nostr_relay_read_duration_ms[5m])
- Write confirmation success ratio (15m):
sum by (relay) (increase(nostr_relay_probe_runs_total{result="success"}[15m])) / sum by (relay) (increase(nostr_relay_probe_runs_total[15m]))
- Probe errors by check:
sum by (relay, check) (increase(nostr_relay_probe_errors_total[15m]))
Health endpoint
GET /healthzreturns:200when process is running and probe data is fresh enough503when shutting down or probe data is stale/not yet available
Notes
- Relay probes are isolated; one relay failure does not block others.
nocapdefault adapters are explicitly loaded before checks.- Probe concurrency is bounded in code (
DEFAULT_PROBE_CONCURRENCYinsrc/config.ts). - Graceful shutdown handles
SIGINTandSIGTERM.