relay-exporter

Production-oriented Prometheus exporter for monitoring a fixed set of Nostr relays.

What it does

  • Probes each configured relay either on an interval, or on-demand when /metrics is scraped.
  • Uses @nostrwatch/nocap for open + read checks.
  • Performs a low-noise write confirmation check (kind 30078 by default) using deterministic d tags.
  • By default, requires read-after-write verification.
  • Exposes:
    • /metrics for Prometheus scraping
    • /healthz for process and probe-loop health
  • Publishes default process metrics via prom-client.

Install

pnpm install

If you are bootstrapping from scratch, these are the exact dependency commands:

pnpm add @nostrwatch/nocap prom-client nostr-tools p-limit
pnpm add -D typescript tsx @types/node

Configuration

Copy and edit:

cp .env.example .env

Example: run probes every 5 minutes instead of on each scrape:

export PROBE_INTERVAL_SECONDS=300

Environment variables:

  • RELAYS (required): comma-separated wss:// URLs
  • PROBE_INTERVAL_SECONDS (default: 0; 0 means run probes on each /metrics scrape)
  • PROBE_TIMEOUT_SECONDS (default: 10)
  • LISTEN_ADDR (default: 0.0.0.0)
  • PORT (default: 9464)
  • LOG_LEVEL (default: info; one of debug|info|warn|error)
  • WRITE_CHECK_ENABLED (default: true)
  • WRITE_CHECK_VERIFY_READ (default: true; set false to treat publish OK as sufficient)
  • WRITE_CHECK_KIND (default: 30078)
  • WRITE_CHECK_PRIVKEY (optional; supports nsec1... or 64-char hex)

Write confirmation key material

  • WRITE_CHECK_PRIVKEY may be an nsec1... value or a 64-character hex private key.
  • WRITE_CHECK_PUBKEY is not needed; write-check pubkey is always derived from the private key.
  • If WRITE_CHECK_PRIVKEY is missing or invalid and write checks are enabled, the exporter generates an ephemeral key for the running process and continues write checks.
  • Private key values are never logged.

Run

Development with auto-reload:

pnpm dev

Production build:

pnpm build
pnpm start

Run tests:

pnpm test

Run the live write-confirm diagnostic for offchain.pub (opt-in):

LIVE_RELAY_TEST_OFFCHAIN=1 \
LIVE_RELAY_TEST_RELAYS="wss://offchain.pub" \
pnpm test test/live.offchain.test.ts

Full example with write verification enabled:

LIVE_RELAY_TEST_OFFCHAIN=1 \
LIVE_RELAY_TEST_RELAYS="wss://offchain.pub" \
LIVE_RELAY_TEST_WRITE_VERIFY_READ=1 \
LIVE_RELAY_TEST_TIMEOUT_SECONDS=8 \
LIVE_RELAY_TEST_EXPECT_NO_FAILURES=1 \
pnpm test test/live.offchain.test.ts

Faster local loop (reduced stability sampling):

LIVE_RELAY_TEST_OFFCHAIN=1 \
LIVE_RELAY_TEST_RELAYS="wss://offchain.pub" \
LIVE_RELAY_TEST_SAMPLES=2 \
LIVE_RELAY_TEST_SCRAPE_EVERY_MS=250 \
pnpm test test/live.offchain.test.ts

Optional knobs (defaults favor faster feedback):

  • LIVE_RELAY_TEST_SAMPLES (default 4)
  • LIVE_RELAY_TEST_SCRAPE_EVERY_MS (default 500)
  • LIVE_RELAY_TEST_TIMEOUT_SECONDS (default 8)
  • LIVE_RELAY_TEST_RELAYS (default "wss://offchain.pub"; comma-separated relay list)
  • LIVE_RELAY_TEST_WRITE_VERIFY_READ=1 to force read-after-write verification
  • LIVE_RELAY_TEST_EXPECT_NO_FAILURES=1 to make the test fail on any write-confirm/probe failures

Exposed metrics

Relay-level labels use {relay} unless stated:

  • nostr_relay_up (gauge)
  • nostr_relay_open_ok (gauge)
  • nostr_relay_read_ok (gauge)
  • nostr_relay_write_confirm_ok (gauge)
  • nostr_relay_open_duration_ms (gauge)
  • nostr_relay_read_duration_ms (gauge)
  • nostr_relay_write_duration_ms (gauge, -1 when unavailable/disabled)
  • nostr_relay_last_success_unixtime (gauge)
  • nostr_relay_probe_errors_total{relay,check} (counter)
  • nostr_relay_probe_runs_total{relay,result} (counter; result=success|failure)

Also includes all default Node.js process/runtime metrics from prom-client.

Prometheus scrape config example

scrape_configs:
  - job_name: nostr-relay-exporter
    scrape_interval: 15s
    metrics_path: /metrics
    static_configs:
      - targets:
          - "relay-exporter.internal:9464"

Example Grafana queries

  • Relay up status by relay:
    • max by (relay) (nostr_relay_up)
  • Open latency:
    • avg_over_time(nostr_relay_open_duration_ms[5m])
  • Read latency:
    • avg_over_time(nostr_relay_read_duration_ms[5m])
  • Write confirmation success ratio (15m):
    • sum by (relay) (increase(nostr_relay_probe_runs_total{result="success"}[15m])) / sum by (relay) (increase(nostr_relay_probe_runs_total[15m]))
  • Probe errors by check:
    • sum by (relay, check) (increase(nostr_relay_probe_errors_total[15m]))

Health endpoint

  • GET /healthz returns:
    • 200 when process is running and probe data is fresh enough
    • 503 when shutting down or probe data is stale/not yet available

Notes

  • Relay probes are isolated; one relay failure does not block others.
  • nocap default adapters are explicitly loaded before checks.
  • Probe concurrency is bounded in code (DEFAULT_PROBE_CONCURRENCY in src/config.ts).
  • Graceful shutdown handles SIGINT and SIGTERM.
Description
No description provided
Readme 272 KiB
Languages
TypeScript 88.4%
Nix 11.6%