Dashboard Grafana: Guida Completa

Accesso a Grafana

Grafana e raggiungibile all'indirizzo:

Produzione: https://grafana.platform.sellogic.cloud
Locale (port-forward): http://localhost:3000

# Port-forward per accesso locale
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

Autenticazione: SSO tramite Keycloak (OAuth2). I ruoli Keycloak mappano ai ruoli Grafana:

grafana-admin → Admin
grafana-editor → Editor
platform-ops → Viewer

Dashboard Predefinite (kube-prometheus-stack)

L'installazione di kube-prometheus-stack include dashboard pre-configurate per il cluster Kubernetes:

Dashboard	Descrizione	Uso Principale
Kubernetes / Compute Resources / Cluster	Panoramica risorse CPU/Memory del cluster	Capacity planning
Kubernetes / Compute Resources / Namespace (Pods)	Risorse per namespace `pos-enterprise`	Identificare pod che consumano troppo
Kubernetes / Networking / Pod	Traffico rete per pod	Debug problemi di rete
Node Exporter / Nodes	Metriche OS per nodo (CPU, disk I/O, RAM)	Troubleshooting infrastrutturale
CoreDNS	Performance DNS interno al cluster	Debug latenza DNS
ETCD	Stato del datastore etcd	Salute del control plane

Filtro per Namespace

In tutte le dashboard K8s, utilizzare la variabile $namespace impostata su pos-enterprise per visualizzare solo i pod della piattaforma.

Dashboard Custom Moleculer

1. Service Health Overview

UID: moleculer-service-health

Pannelli principali:

Pannello	Query PromQL	Tipo
Servizi attivi	`count(up{namespace="pos-enterprise"} == 1) by (service)`	Stat
Uptime per servizio	`avg_over_time(up{namespace="pos-enterprise"}[24h]) * 100`	Table
Pod restart count	`increase(kube_pod_container_status_restarts_total{namespace="pos-enterprise"}[1h])`	Time series
Readiness status	`kube_pod_status_ready{namespace="pos-enterprise", condition="true"}`	Status map

2. Action Latency & Throughput

UID: moleculer-action-latency

# P95 latenza per azione (ultime 5 min)
histogram_quantile(0.95,
  rate(moleculer_action_duration_seconds_bucket{namespace="pos-enterprise"}[5m])
) by (service, action)

# Throughput (req/s) per servizio
sum(rate(moleculer_action_requests_total{namespace="pos-enterprise"}[5m])) by (service)

# Error rate percentuale per servizio
sum(rate(moleculer_action_errors_total{namespace="pos-enterprise"}[5m])) by (service)
/
sum(rate(moleculer_action_requests_total{namespace="pos-enterprise"}[5m])) by (service)
* 100

3. Tenant Activity

UID: moleculer-tenant-activity

Dashboard con variabile $tenant che filtra tutte le metriche per tenant:

# Ordini al minuto per tenant
sum(rate(pos_orders_total{tenant="$tenant"}[5m])) * 60

# Sessioni attive per tenant
pos_active_sessions{tenant="$tenant"}

# Latenza media operazioni per tenant
avg(rate(moleculer_action_duration_seconds_sum{tenant="$tenant"}[5m])
  / rate(moleculer_action_duration_seconds_count{tenant="$tenant"}[5m]))

4. Infrastructure Monitor

UID: pos-infrastructure

Pannello	Descrizione
MongoDB Connection Pool	Connessioni attive/disponibili per cluster
Redis Memory Usage	Memoria utilizzata vs limite per istanza
NATS Reconnections	Conteggio riconnessioni per nodo
Cache Hit Rate	Percentuale hit/miss della cache Redis
Sync Lag	Ritardo sincronizzazione per device e tenant

Query PromQL Utili

Performance

# Top 10 azioni piu lente (P99)
topk(10,
  histogram_quantile(0.99,
    rate(moleculer_action_duration_seconds_bucket{namespace="pos-enterprise"}[15m])
  ) by (service, action)
)

# Azioni con piu di 1% error rate
(
  sum(rate(moleculer_action_errors_total[5m])) by (service, action)
  / sum(rate(moleculer_action_requests_total[5m])) by (service, action)
) > 0.01

# Saturazione CPU per pod (ratio request vs usage)
sum(rate(container_cpu_usage_seconds_total{namespace="pos-enterprise"}[5m])) by (pod)
/ sum(kube_pod_container_resource_requests{namespace="pos-enterprise", resource="cpu"}) by (pod)

Risorse

# Memory usage vs limit (percentuale)
sum(container_memory_working_set_bytes{namespace="pos-enterprise"}) by (pod)
/ sum(kube_pod_container_resource_limits{namespace="pos-enterprise", resource="memory"}) by (pod)
* 100

# Pod in stato non-Running
kube_pod_status_phase{namespace="pos-enterprise", phase!="Running", phase!="Succeeded"} == 1

# PVC usage (percentuale disco usato)
kubelet_volume_stats_used_bytes{namespace="pos-enterprise"}
/ kubelet_volume_stats_capacity_bytes{namespace="pos-enterprise"}
* 100

Business

# Ordini totali oggi per tenant
increase(pos_orders_total[24h])

# Tenant con piu ordini (top 5)
topk(5, increase(pos_orders_total[1h]))

Creare un Nuovo Pannello

Procedura

Aprire la dashboard target → Edit (icona matita)
Add → Visualization
Selezionare la data source: Prometheus per metriche, Loki per log
Inserire la query PromQL o LogQL
Configurare:
- Panel title: nome descrittivo
- Visualization type: Time series, Stat, Table, Gauge, etc.
- Thresholds: soglie colore (verde/giallo/rosso)
- Legend: {{service}} - {{action}} per label leggibili
Apply e poi Save dashboard

Variabili Template

Le dashboard custom utilizzano queste variabili riutilizzabili:

# Variabile $namespace (tipo: Query)
label_values(up, namespace)
# Default: pos-enterprise

# Variabile $service (tipo: Query, dipende da $namespace)
label_values(up{namespace="$namespace"}, service)

# Variabile $tenant (tipo: Query)
label_values(moleculer_action_requests_total{namespace="$namespace"}, tenant)

# Variabile $node_type (tipo: Custom)
# Valori: gateway,core,auth,commerce,warehouse,media,reports,platform,finance,knowledge,scheduler,cash,simulator

Provisioning Dashboard via ConfigMap

Le dashboard custom sono versionate come ConfigMap nel repository:

apiVersion: v1
kind: ConfigMap
metadata:
  name: grafana-dashboard-moleculer
  namespace: monitoring
  labels:
    grafana_dashboard: "1"   # Label richiesta per auto-discovery
data:
  moleculer-overview.json: |
    {
      "dashboard": { ... },
      "overwrite": true
    }

# Applicare nuove dashboard
kubectl apply -f k8s/monitoring/dashboards/

# Verificare che Grafana abbia caricato la dashboard
kubectl logs -n monitoring deployment/prometheus-grafana -c grafana --tail=20 | grep "dashboard"

Best Practice

Non modificare le dashboard predefinite: creare copie con prefisso [Custom]
Usare variabili template per tutti i filtri (namespace, service, tenant)
Impostare refresh a 30s per dashboard operative, 5m per dashboard di capacity
Annotazioni: aggiungere annotazioni per deploy e incident (kubectl annotate)
Alerting in-dashboard: preferire PrometheusRule rispetto agli alert Grafana per consistenza

Riferimenti Incrociati

Stack di Monitoring: Panoramica - Architettura generale dello stack
Sistema di Alerting - Configurazione alert basati sulle metriche
Analisi Log con Loki - Dashboard di log analysis
Ottimizzazione delle Prestazioni - Metriche per il tuning

Questa pagina ti è stata utile?

Accesso a Grafana​

Dashboard Predefinite (kube-prometheus-stack)​

Dashboard Custom Moleculer​

1. Service Health Overview​

2. Action Latency & Throughput​

3. Tenant Activity​

4. Infrastructure Monitor​

Query PromQL Utili​

Performance​

Risorse​

Business​

Creare un Nuovo Pannello​

Procedura​

Variabili Template​

Provisioning Dashboard via ConfigMap​

Best Practice​

Riferimenti Incrociati​