5.1 KiB
5.1 KiB
relay-exporter
Production-oriented Prometheus exporter for monitoring a fixed set of Nostr relays.
What it does
- Probes each configured relay either on an interval, or on-demand when
/metricsis scraped. - Uses
@nostrwatch/nocapforopen+readchecks. - Performs a low-noise write confirmation check (kind
30078by default) using deterministicdtags. - By default, requires read-after-write verification.
- Exposes:
/metricsfor Prometheus scraping/healthzfor process and probe-loop health
- Publishes default process metrics via
prom-client.
Install
pnpm install
If you are bootstrapping from scratch, these are the exact dependency commands:
pnpm add @nostrwatch/nocap prom-client nostr-tools p-limit
pnpm add -D typescript tsx @types/node
Configuration
Copy and edit:
cp .env.example .env
Example: run probes every 5 minutes instead of on each scrape:
export PROBE_INTERVAL_SECONDS=300
Environment variables:
RELAYS(required): comma-separatedwss://URLsPROBE_INTERVAL_SECONDS(default:0;0means run probes on each/metricsscrape)PROBE_TIMEOUT_SECONDS(default:10)LISTEN_ADDR(default:0.0.0.0)PORT(default:9464)LOG_LEVEL(default:info; one ofdebug|info|warn|error)WRITE_CHECK_ENABLED(default:true)WRITE_CHECK_VERIFY_READ(default:true; setfalseto treat publishOKas sufficient)WRITE_CHECK_KIND(default:30078)WRITE_CHECK_PRIVKEY(optional; supportsnsec1...or 64-char hex)
Write confirmation key material
WRITE_CHECK_PRIVKEYmay be annsec1...value or a 64-character hex private key.WRITE_CHECK_PUBKEYis not needed; write-check pubkey is always derived from the private key.- If
WRITE_CHECK_PRIVKEYis missing or invalid and write checks are enabled, the exporter generates an ephemeral key for the running process and continues write checks. - Private key values are never logged.
Run
Development with auto-reload:
pnpm dev
Production build:
pnpm build
pnpm start
Run tests:
pnpm test
Run the live write-confirm diagnostic for offchain.pub (opt-in):
LIVE_RELAY_TEST_OFFCHAIN=1 \
LIVE_RELAY_TEST_RELAYS="wss://offchain.pub" \
pnpm test test/live.offchain.test.ts
Full example with write verification enabled:
LIVE_RELAY_TEST_OFFCHAIN=1 \
LIVE_RELAY_TEST_RELAYS="wss://offchain.pub" \
LIVE_RELAY_TEST_WRITE_VERIFY_READ=1 \
LIVE_RELAY_TEST_TIMEOUT_SECONDS=8 \
LIVE_RELAY_TEST_EXPECT_NO_FAILURES=1 \
pnpm test test/live.offchain.test.ts
Faster local loop (reduced stability sampling):
LIVE_RELAY_TEST_OFFCHAIN=1 \
LIVE_RELAY_TEST_RELAYS="wss://offchain.pub" \
LIVE_RELAY_TEST_SAMPLES=2 \
LIVE_RELAY_TEST_SCRAPE_EVERY_MS=250 \
pnpm test test/live.offchain.test.ts
Optional knobs (defaults favor faster feedback):
LIVE_RELAY_TEST_SAMPLES(default4)LIVE_RELAY_TEST_SCRAPE_EVERY_MS(default500)LIVE_RELAY_TEST_TIMEOUT_SECONDS(default8)LIVE_RELAY_TEST_RELAYS(default"wss://offchain.pub"; comma-separated relay list)LIVE_RELAY_TEST_WRITE_VERIFY_READ=1to force read-after-write verificationLIVE_RELAY_TEST_EXPECT_NO_FAILURES=1to make the test fail on any write-confirm/probe failures
Exposed metrics
Relay-level labels use {relay} unless stated:
nostr_relay_up(gauge)nostr_relay_open_ok(gauge)nostr_relay_read_ok(gauge)nostr_relay_write_confirm_ok(gauge)nostr_relay_open_duration_ms(gauge)nostr_relay_read_duration_ms(gauge)nostr_relay_write_duration_ms(gauge,-1when unavailable/disabled)nostr_relay_last_success_unixtime(gauge)nostr_relay_probe_errors_total{relay,check}(counter)nostr_relay_probe_runs_total{relay,result}(counter;result=success|failure)
Also includes all default Node.js process/runtime metrics from prom-client.
Prometheus scrape config example
scrape_configs:
- job_name: nostr-relay-exporter
scrape_interval: 15s
metrics_path: /metrics
static_configs:
- targets:
- "relay-exporter.internal:9464"
Example Grafana queries
- Relay up status by relay:
max by (relay) (nostr_relay_up)
- Open latency:
avg_over_time(nostr_relay_open_duration_ms[5m])
- Read latency:
avg_over_time(nostr_relay_read_duration_ms[5m])
- Write confirmation success ratio (15m):
sum by (relay) (increase(nostr_relay_probe_runs_total{result="success"}[15m])) / sum by (relay) (increase(nostr_relay_probe_runs_total[15m]))
- Probe errors by check:
sum by (relay, check) (increase(nostr_relay_probe_errors_total[15m]))
Health endpoint
GET /healthzreturns:200when process is running and probe data is fresh enough503when shutting down or probe data is stale/not yet available
Notes
- Relay probes are isolated; one relay failure does not block others.
nocapdefault adapters are explicitly loaded before checks.- Probe concurrency is bounded in code (
DEFAULT_PROBE_CONCURRENCYinsrc/config.ts). - Graceful shutdown handles
SIGINTandSIGTERM.