Health Check e Probes: Architettura

Panoramica

Impronto Enterprise implementa un sistema di health check a 3 livelli (startup, liveness, readiness) per ogni nodo Moleculer. Un server HTTP dedicato sulla porta 3001 espone gli endpoint di health, separato dal traffico applicativo sulla porta principale.

┌────────────────────────────────────────────────┐
│                  Pod Moleculer                  │
│                                                 │
│  ┌──────────────┐    ┌───────────────────────┐ │
│  │ Moleculer    │    │ Health HTTP Server     │ │
│  │ Broker       │    │ porta 3001             │ │
│  │ (porta 3000) │◄───│                        │ │
│  │              │    │ /public/health/live     │ │
│  │ health.mixin │    │ /public/health/ready    │ │
│  │              │    │ /public/health/overview  │ │
│  │              │    │ /public/health/detailed  │ │
│  └──────────────┘    └───────────────────────┘ │
└────────────────────────────────────────────────┘
        ▲                       ▲
   Traffico app           K8s Probes
   (NATS/HTTP)            (kubelet)

Sistema di Probes a 3 Livelli

Startup Probe

Verifica che il servizio abbia completato l'inizializzazione (connessione a MongoDB, Redis, NATS).

startupProbe:
  httpGet:
    path: /public/health/live
    port: 3001
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 30    # 5s * 30 = max 2.5 minuti per avviarsi
  timeoutSeconds: 3

Comportamento: finche la startup probe non ha successo, liveness e readiness non vengono eseguite. Questo evita che un servizio lento a partire venga ucciso dalla liveness probe.

Liveness Probe

Verifica che il processo sia vivo e non in stato di stallo (deadlock, event loop bloccato).

livenessProbe:
  httpGet:
    path: /public/health/live
    port: 3001
  periodSeconds: 15
  failureThreshold: 3     # 3 fallimenti consecutivi = restart
  timeoutSeconds: 5

Comportamento: se fallisce per 3 volte consecutive, Kubernetes esegue il restart del container.

Readiness Probe

Verifica che il servizio sia pronto a ricevere traffico (connessioni DB attive, dipendenze disponibili).

readinessProbe:
  httpGet:
    path: /public/health/ready
    port: 3001
  periodSeconds: 10
  failureThreshold: 3     # 3 fallimenti = rimosso dal Service
  successThreshold: 1
  timeoutSeconds: 5

Comportamento: se fallisce, il pod viene rimosso dagli endpoint del Service Kubernetes (non riceve piu traffico), ma NON viene riavviato. Quando torna ready, viene re-inserito.

Endpoint HTTP di Health

Il server HTTP sulla porta 3001 espone quattro endpoint:

GET /public/health/live

Risposta minimale per la liveness probe. Verifica solo che l'event loop Node.js risponda.

// 200 OK
{ "status": "alive", "timestamp": "2026-03-29T10:00:00.000Z" }

// 503 Service Unavailable (event loop bloccato)
{ "status": "dead", "timestamp": "..." }

GET /public/health/ready

Verifica che tutte le dipendenze critiche siano connesse:

// 200 OK
{
  "status": "ready",
  "checks": {
    "mongodb": { "status": "up", "latency": 3 },
    "redis": { "status": "up", "latency": 1 },
    "nats": { "status": "up", "latency": 0 }
  }
}

// 503 Service Unavailable
{
  "status": "not_ready",
  "checks": {
    "mongodb": { "status": "up", "latency": 3 },
    "redis": { "status": "down", "error": "ECONNREFUSED" },
    "nats": { "status": "up", "latency": 0 }
  }
}

GET /public/health/overview

Panoramica sintetica dello stato del nodo:

{
  "nodeID": "core-7b8f9d6c4-x2k9m",
  "nodeType": "core",
  "status": "ready",
  "uptime": 86432,
  "services": ["orders", "products", "categories", "customers"],
  "version": "2.4.1",
  "moleculerVersion": "0.14.34"
}

GET /public/health/detailed

Informazioni dettagliate per diagnostica (protetto, solo accesso interno):

{
  "nodeID": "core-7b8f9d6c4-x2k9m",
  "status": "ready",
  "uptime": 86432,
  "memory": {
    "heapUsed": 128456789,
    "heapTotal": 256000000,
    "rss": 312000000,
    "external": 45000000
  },
  "cpu": {
    "user": 1245678,
    "system": 345678
  },
  "checks": {
    "mongodb": {
      "status": "up",
      "latency": 3,
      "poolSize": 10,
      "availableConnections": 7,
      "database": "t_demo"
    },
    "redis": {
      "status": "up",
      "latency": 1,
      "memoryUsed": "45.2MB",
      "connectedClients": 3
    },
    "nats": {
      "status": "up",
      "latency": 0,
      "reconnects": 0
    }
  },
  "customChecks": {
    "printer_connection": { "status": "up" },
    "fiscal_module": { "status": "up", "lastSync": "2026-03-29T09:55:00Z" }
  }
}

health.mixin.js: Implementazione

Il mixin health.mixin.js viene incluso in ogni servizio per abilitare i controlli di salute personalizzati.

Struttura del Mixin

// mixins/health.mixin.js
module.exports = {
  settings: {
    healthCheck: {
      port: 3001,
      readiness: {
        mongodb: true,    // Verifica connessione MongoDB
        redis: true,      // Verifica connessione Redis
        nats: true        // Verifica connessione NATS
      }
    }
  },

  methods: {
    /**
     * Registra un health check custom per questo servizio.
     * @param {string} name - Nome del check
     * @param {Function} checkFn - Funzione async che ritorna { status, ...details }
     */
    registerHealthCheck(name, checkFn) {
      this._customHealthChecks = this._customHealthChecks || {};
      this._customHealthChecks[name] = checkFn;
    },

    /**
     * Esegue tutti i check registrati.
     * @returns {Object} Risultati aggregati
     */
    async runHealthChecks() {
      const results = {};
      for (const [name, fn] of Object.entries(this._customHealthChecks || {})) {
        try {
          results[name] = await fn.call(this);
        } catch (err) {
          results[name] = { status: "down", error: err.message };
        }
      }
      return results;
    }
  },

  created() {
    this._customHealthChecks = {};
  },

  async started() {
    // Il server HTTP viene avviato dal primo servizio che si registra sul nodo
    // Gestito centralmente dal broker
  }
};

Aggiungere un Health Check Custom a un Nuovo Servizio

// services/mio-servizio.service.js
const HealthMixin = require("../mixins/health.mixin");

module.exports = {
  name: "mio-servizio",
  mixins: [HealthMixin],

  async started() {
    // Registrare un check custom
    this.registerHealthCheck("external_api", async () => {
      try {
        const start = Date.now();
        await this.externalClient.ping();
        return {
          status: "up",
          latency: Date.now() - start
        };
      } catch (err) {
        return {
          status: "down",
          error: err.message,
          lastSuccess: this._lastExternalApiSuccess
        };
      }
    });

    // Check per una coda di lavoro
    this.registerHealthCheck("job_queue", async () => {
      const queueLength = await this.getQueueLength();
      return {
        status: queueLength < 10000 ? "up" : "degraded",
        queueLength,
        oldestJob: await this.getOldestJobAge()
      };
    });
  }
};

Configurazione Kubernetes Completa

Esempio di Deployment con tutti i probe configurati:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pos-core
  namespace: pos-enterprise
spec:
  template:
    spec:
      containers:
        - name: core
          image: registry.sellogic.cloud/impronto/core:2.4.1
          ports:
            - name: app
              containerPort: 3000
            - name: health
              containerPort: 3001
            - name: metrics
              containerPort: 3030
          startupProbe:
            httpGet:
              path: /public/health/live
              port: health
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 30
            timeoutSeconds: 3
          livenessProbe:
            httpGet:
              path: /public/health/live
              port: health
            periodSeconds: 15
            failureThreshold: 3
            timeoutSeconds: 5
          readinessProbe:
            httpGet:
              path: /public/health/ready
              port: health
            periodSeconds: 10
            failureThreshold: 3
            successThreshold: 1
            timeoutSeconds: 5

Comandi Diagnostici

# Verificare lo stato di un pod specifico
kubectl describe pod -n pos-enterprise pos-core-7b8f9d6c4-x2k9m | grep -A5 "Conditions"

# Testare manualmente gli endpoint di health
kubectl exec -n pos-enterprise pos-core-7b8f9d6c4-x2k9m -- curl -s http://localhost:3001/public/health/ready | jq

# Vedere gli eventi dei probe
kubectl get events -n pos-enterprise --field-selector reason=Unhealthy --sort-by='.lastTimestamp'

# Log dei probe failure
kubectl logs -n pos-enterprise pos-core-7b8f9d6c4-x2k9m --previous | tail -50

Troubleshooting Probe

Sintomo	Causa Probabile	Risoluzione
Pod in CrashLoopBackOff	Startup probe fallisce ripetutamente	Verificare log con `--previous`, aumentare `failureThreshold`
Pod Running ma 0/1 Ready	Readiness probe fallisce	Verificare `/public/health/ready`, controllare connessioni DB
Restart frequenti senza errori nei log	Liveness timeout (event loop bloccato)	Aumentare `timeoutSeconds`, verificare operazioni sincrone
Tutti i pod non-ready simultaneamente	Dipendenza esterna down (MongoDB/Redis)	Verificare connettivita ai servizi esterni

Riferimenti Incrociati

Stack di Monitoring: Panoramica - Come i probe si integrano nello stack
Sistema di Alerting - Alert per probe failure
Troubleshooting Infrastrutturale - Diagnosi problemi di connettivita
Guida allo Scaling - Interazione readiness probe e HPA

Questa pagina ti è stata utile?

Panoramica​

Sistema di Probes a 3 Livelli​

Startup Probe​

Liveness Probe​

Readiness Probe​

Endpoint HTTP di Health​

GET /public/health/live​

GET /public/health/ready​

GET /public/health/overview​

GET /public/health/detailed​

health.mixin.js: Implementazione​

Struttura del Mixin​

Aggiungere un Health Check Custom a un Nuovo Servizio​

Configurazione Kubernetes Completa​

Comandi Diagnostici​

Troubleshooting Probe​

Riferimenti Incrociati​