Compare commits

...

13 Commits

Author SHA1 Message Date
egutierrez a884ed0dbc chore: eliminar workflow de Gitea/Coolify
Eliminación del workflow de despliegue automático a Coolify.

El repositorio actualmente no requiere despliegue automático, por lo que se elimina la configuración del workflow de Gitea para simplificar la estructura del proyecto.

Archivo eliminado:
- .gitea/workflows/deploy.yml
2026-03-23 20:07:12 +01:00
egutierrez 4df3a11e0c merge: quick/simplify-stack-alloy-prometheus-loki-grafana — Simplificación del stack a Alloy + Prometheus + Loki + Grafana
Deploy to Coolify / deploy (push) Has been cancelled
2026-03-23 20:01:31 +01:00
egutierrez 8509a1a4e8 refactor: simplificar configuración de Alloy para métricas y logs
Limpieza de la configuración de Alloy eliminando componentes no utilizados:

Métricas (Prometheus):
- Mantiene scraping de: Alloy, Prometheus, Grafana, Loki, app_metrics
- Comentados: Tempo, Node Exporter, cAdvisor
- Receptor HTTP para métricas externas (puerto 9999)
- Remote write a Prometheus

Logs (Loki):
- Descubrimiento y recolección de logs de contenedores Docker
- Receptor HTTP para logs externos (puerto 3101)
- Receptor Syslog (puerto 1514)
- Envío a Loki

Trazas:
- Sección completa de Tempo comentada (OTLP, Jaeger, Zipkin)

La configuración resultante está enfocada exclusivamente en métricas y logs, reduciendo la superficie de ataque y simplificando el troubleshooting.
2026-03-23 20:01:06 +01:00
egutierrez 65673636c4 refactor: simplificar docker-compose a stack esencial
Simplificación del stack de observabilidad manteniendo solo los servicios esenciales:

Servicios activos:
- Alloy (puertos mínimos: 12345, 9999, 3101)
- Prometheus (métricas)
- Loki (logs)
- Grafana (visualización)

Servicios comentados:
- Tempo (trazas distribuidas)
- Promtail (reemplazado por Alloy)
- Node Exporter (métricas del host)
- cAdvisor (métricas de contenedores)

Cambios:
- Reducción de puertos de Alloy (eliminados puertos de trazas: OTLP, Jaeger, Zipkin)
- Actualización de dependencias de Alloy y Grafana
- Limpieza de volúmenes no utilizados (tempo_data, promtail_positions)
- Grafana ahora solo depende de Prometheus y Loki

El stack simplificado mantiene capacidad completa de métricas y logs con menor complejidad operacional.
2026-03-23 20:00:57 +01:00
egutierrez 87cddfa23e intentando capturar mas metricas
Deploy to Coolify / deploy (push) Has been cancelled
2025-11-24 23:27:32 +01:00
egutierrez 6a49bd573c alloy captura y guarda cada segundo
Deploy to Coolify / deploy (push) Has been cancelled
2025-11-24 23:17:39 +01:00
egutierrez 5b6a0ddbc2 capacidad para guardar metricas en prometheus
Deploy to Coolify / deploy (push) Has been cancelled
2025-11-24 22:53:07 +01:00
egutierrez 399613e009 env example añadido
Deploy to Coolify / deploy (push) Waiting to run
2025-11-16 17:07:22 +01:00
egutierrez e13560202b mas metricas de docker y de node exporter
Deploy to Coolify / deploy (push) Has been cancelled
2025-11-16 17:00:26 +01:00
egutierrez bd781cf7d3 promptail de docker añadido
Deploy to Coolify / deploy (push) Has been cancelled
2025-11-16 16:29:23 +01:00
egutierrez b7397cdefd documentos añadidos
Deploy to Coolify / deploy (push) Has been cancelled
2025-11-16 16:00:00 +01:00
egutierrez ff7d41c9ce Merge remote-tracking branch 'origin/master'
Deploy to Coolify / deploy (push) Has been cancelled
union ramas discrepantes
2025-11-14 11:57:48 +01:00
egutierrez d99e544ebe restart siempre 2025-11-14 11:22:04 +01:00
15 changed files with 1118 additions and 132 deletions
+11
View File
@@ -11,6 +11,17 @@ LOKI_PORT=3100
TEMPO_PORT=3200 TEMPO_PORT=3200
ALLOY_PORT=12345 ALLOY_PORT=12345
ALLOY_UI_PORT=12345 ALLOY_UI_PORT=12345
ALLOY_METRICS_PORT=9999
ALLOY_LOGS_HTTP_PORT=3101
ALLOY_SYSLOG_PORT=1514
ALLOY_OTLP_GRPC_PORT=4317
ALLOY_OTLP_HTTP_PORT=4318
ALLOY_JAEGER_GRPC_PORT=14250
ALLOY_JAEGER_HTTP_PORT=14268
ALLOY_JAEGER_COMPACT_PORT=6831
ALLOY_ZIPKIN_PORT=9411
NODE_EXPORTER_PORT=9100
CADVISOR_PORT=8081
# Configuración de volúmenes # Configuración de volúmenes
# Rutas donde se almacenarán los datos persistentes # Rutas donde se almacenarán los datos persistentes
-35
View File
@@ -1,35 +0,0 @@
name: Deploy to Coolify
on:
push:
branches:
- master
- main
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Deploy to Coolify
run: |
curl -X POST \
-H "Authorization: Bearer ${{ secrets.COOLIFY_TOKEN }}" \
-H "Content-Type: application/json" \
-d '{
"deploymentUuid": "${{ secrets.COOLIFY_DEPLOYMENT_UUID }}",
"gitCommitSha": "${{ github.sha }}",
"gitBranch": "${{ github.ref_name }}"
}' \
${{ secrets.COOLIFY_WEBHOOK_URL }}
- name: Notify deployment
if: always()
run: |
if [ "${{ job.status }}" == "success" ]; then
echo "✅ Deployment successful"
else
echo "❌ Deployment failed"
fi
+233
View File
@@ -0,0 +1,233 @@
from typing import Dict, Iterable, Optional
from prometheus_client import (
CollectorRegistry,
Counter,
Gauge,
Histogram,
Summary,
start_http_server,
)
class PrometheusMetric:
"""
Helper ligero para exponer métricas de Prometheus con prefijo y etiquetas comunes.
Inicia un endpoint HTTP para que Prometheus pueda scrapear las métricas.
"""
def __init__(
self,
prefix: str = "suite_logs",
default_labels: Optional[Dict[str, str]] = None,
port: int = 8000,
start_server: bool = True,
registry: Optional[CollectorRegistry] = None,
):
"""
:param prefix: prefijo que se agregará a todas las métricas (ej: suite_logs_)
:param default_labels: etiquetas que se agregan a todas las series (ej: {"service": "api"})
:param port: puerto HTTP donde se expondrán las métricas
:param start_server: inicia el servidor HTTP automáticamente
:param registry: permite inyectar un CollectorRegistry (útil para tests)
"""
self.prefix = prefix.rstrip("_")
self.default_labels = dict(default_labels or {})
self.registry = registry or CollectorRegistry()
self.port = port
self._metric_cache: Dict[str, object] = {}
# Se evita iniciar múltiples servidores si se crean varias instancias por error.
if start_server:
start_http_server(self.port, registry=self.registry)
def counter(
self,
name: str,
doc: str,
labels: Optional[Iterable[str]] = None,
prefix: Optional[str] = None,
):
full_name = self._metric_name(name, prefix)
metric = Counter(
full_name,
doc,
labelnames=self._label_names(labels),
registry=self.registry,
)
return _CounterHandle(metric, self.default_labels)
def gauge(
self,
name: str,
doc: str,
labels: Optional[Iterable[str]] = None,
prefix: Optional[str] = None,
):
full_name = self._metric_name(name, prefix)
metric = Gauge(
full_name,
doc,
labelnames=self._label_names(labels),
registry=self.registry,
)
return _GaugeHandle(metric, self.default_labels)
def histogram(
self,
name: str,
doc: str,
labels: Optional[Iterable[str]] = None,
buckets: Optional[Iterable[float]] = None,
prefix: Optional[str] = None,
):
full_name = self._metric_name(name, prefix)
metric = Histogram(
full_name,
doc,
labelnames=self._label_names(labels),
registry=self.registry,
buckets=buckets,
)
return _ObserveHandle(metric, self.default_labels)
def summary(
self,
name: str,
doc: str,
labels: Optional[Iterable[str]] = None,
prefix: Optional[str] = None,
):
full_name = self._metric_name(name, prefix)
metric = Summary(
full_name,
doc,
labelnames=self._label_names(labels),
registry=self.registry,
)
return _ObserveHandle(metric, self.default_labels)
# Métodos rápidos para emitir métricas sin guardar el handle
def counter_value(
self,
name: str,
amount: float = 1.0,
labels: Optional[Dict[str, str]] = None,
doc: str = "",
prefix: Optional[str] = None,
):
metric = self._get_or_create(Counter, name, doc, labels, prefix)
_CounterHandle(metric, self.default_labels).inc(amount, **(labels or {}))
def gauge_value(
self,
name: str,
value: float,
labels: Optional[Dict[str, str]] = None,
doc: str = "",
prefix: Optional[str] = None,
):
metric = self._get_or_create(Gauge, name, doc, labels, prefix)
_GaugeHandle(metric, self.default_labels).set(value, **(labels or {}))
def histogram_observe(
self,
name: str,
value: float,
labels: Optional[Dict[str, str]] = None,
doc: str = "",
buckets: Optional[Iterable[float]] = None,
prefix: Optional[str] = None,
):
metric = self._get_or_create(
Histogram, name, doc, labels, prefix, buckets=buckets
)
_ObserveHandle(metric, self.default_labels).observe(value, **(labels or {}))
def summary_observe(
self,
name: str,
value: float,
labels: Optional[Dict[str, str]] = None,
doc: str = "",
prefix: Optional[str] = None,
):
metric = self._get_or_create(Summary, name, doc, labels, prefix)
_ObserveHandle(metric, self.default_labels).observe(value, **(labels or {}))
def _metric_name(self, name: str, prefix: Optional[str]) -> str:
base = (prefix or self.prefix).rstrip("_")
return f"{base}_{name}" if base else name
def _label_names(self, labels: Optional[Iterable[str] | Dict[str, str]]) -> Iterable[str]:
names = list(self.default_labels.keys())
if labels:
source = labels.keys() if isinstance(labels, dict) else labels
for label in source:
if label not in names:
names.append(label)
return names
def _get_or_create(
self,
metric_cls,
name: str,
doc: str,
labels: Optional[Dict[str, str]],
prefix: Optional[str],
buckets: Optional[Iterable[float]] = None,
):
full_name = self._metric_name(name, prefix)
labelnames = self._label_names(labels)
cache_key = (metric_cls.__name__, full_name, tuple(labelnames))
if cache_key in self._metric_cache:
return self._metric_cache[cache_key]
kwargs: Dict[str, object] = {
"labelnames": labelnames,
"registry": self.registry,
}
if metric_cls is Histogram and buckets is not None:
kwargs["buckets"] = buckets
metric = metric_cls(full_name, doc or full_name, **kwargs)
self._metric_cache[cache_key] = metric
return metric
class _BaseHandle:
def __init__(self, metric, default_labels: Dict[str, str]):
self.metric = metric
self.default_labels = default_labels
def _child(self, labels: Optional[Dict[str, str]]):
if not self.metric._labelnames:
return self.metric
merged = {**self.default_labels, **(labels or {})}
missing = [l for l in self.metric._labelnames if l not in merged]
if missing:
raise ValueError(f"Faltan labels obligatorios: {missing}")
return self.metric.labels(**merged)
class _CounterHandle(_BaseHandle):
def inc(self, amount: float = 1.0, **labels):
self._child(labels).inc(amount)
class _GaugeHandle(_BaseHandle):
def set(self, value: float, **labels):
self._child(labels).set(value)
def inc(self, amount: float = 1.0, **labels):
self._child(labels).inc(amount)
def dec(self, amount: float = 1.0, **labels):
self._child(labels).dec(amount)
class _ObserveHandle(_BaseHandle):
def observe(self, value: float, **labels):
self._child(labels).observe(value)
+4
View File
@@ -0,0 +1,4 @@
from .LokiLogger import LokiLogger
from .PrometheusMetric import PrometheusMetric
__all__ = ["LokiLogger", "PrometheusMetric"]
+44
View File
@@ -8,6 +8,50 @@ Esta configuracin incluye un stack completo de monitoreo con:
## Estructura de Archivos ## Estructura de Archivos
## Uso rápido en Python (logs y métricas)
- Logs a Loki / Alloy:
```python
from Logger import LokiLogger
logger = LokiLogger(service_name="mi_servicio", min_level="INFO")
logger.info("Aplicación iniciada")
logger.error("Algo falló", add_fields={"detalle": "stacktrace"})
```
- Métricas con prefijo listo para scrapeo de Prometheus:
```python
from Logger import PrometheusMetric
metrics = PrometheusMetric(
prefix="suite_logs",
default_labels={"service_name": "mi_servicio", "env": "dev"},
port=9102, # inicia un servidor HTTP en este puerto
)
requests_total = metrics.counter(
"requests_total", "Solicitudes procesadas", labels=["endpoint"]
)
latency_seconds = metrics.histogram(
"latency_seconds",
"Latencia de peticiones",
labels=["endpoint"],
buckets=[0.1, 0.5, 1, 2, 5],
)
requests_total.inc(endpoint="/health")
latency_seconds.observe(0.35, endpoint="/health")
```
Agrega el puerto (`9102` en el ejemplo) como target de scrape en Prometheus/Alloy para ver las series con el prefijo definido.
Atajo: si quieres emitir sin guardar handles, usa `counter_value`, `gauge_value`, `histogram_observe` y `summary_observe`, pudiendo sobreescribir el prefijo por métrica:
```python
metrics.gauge_value("workers_active", 3, prefix="backend", labels={"queue": "ingest"})
metrics.counter_value("processed_total", 1, prefix="backend", labels={"queue": "ingest"})
```
En este repo, Alloy ya está configurado para scrapear `host.docker.internal:9102` con el `job_name="app_metrics"`. Si tu script expone métricas en ese puerto (con `PrometheusMetric(port=9102)`), se almacenarán sin cambios adicionales.
## Configuración Inicial ## Configuración Inicial
### 1. Configurar variables de entorno ### 1. Configurar variables de entorno
+88 -59
View File
@@ -42,13 +42,42 @@ prometheus.scrape "loki" {
job_name = "loki" job_name = "loki"
} }
// Scraping de Tempo // // Scraping de Tempo
prometheus.scrape "tempo" { // prometheus.scrape "tempo" {
targets = [{"__address__" = "tempo:3200"}] // targets = [{"__address__" = "tempo:3200"}]
// forward_to = [prometheus.remote_write.prometheus.receiver]
// scrape_interval = "15s"
// metrics_path = "/metrics"
// job_name = "tempo"
// }
// // Scraping de Node Exporter (métricas del host Docker)
// prometheus.scrape "node_exporter" {
// targets = [{"__address__" = "node-exporter:9100"}]
// forward_to = [prometheus.remote_write.prometheus.receiver]
// scrape_interval = "15s"
// metrics_path = "/metrics"
// job_name = "node_exporter"
// }
// // Scraping de cAdvisor (métricas detalladas de Docker/containers)
// prometheus.scrape "cadvisor" {
// targets = [{"__address__" = "cadvisor:8080"}]
// forward_to = [prometheus.remote_write.prometheus.receiver]
// scrape_interval = "15s"
// metrics_path = "/metrics"
// job_name = "cadvisor"
// }
// Scraping fijo para métricas expuestas desde el host (scripts Python)
// Corre por defecto en host.docker.internal:9102 para PrometheusMetric
prometheus.scrape "app_metrics" {
targets = [{"__address__" = "host.docker.internal:9102"}]
forward_to = [prometheus.remote_write.prometheus.receiver] forward_to = [prometheus.remote_write.prometheus.receiver]
scrape_interval = "15s" scrape_interval = "1s"
scrape_timeout = "800ms"
metrics_path = "/metrics" metrics_path = "/metrics"
job_name = "tempo" job_name = "app_metrics"
} }
// Receptor para métricas externas // Receptor para métricas externas
@@ -139,66 +168,66 @@ loki.write "loki" {
} }
//============================================================================= //=============================================================================
// TEMPO - TRAZAS // TEMPO - TRAZAS (COMENTADO - NO SE USA)
//============================================================================= //=============================================================================
// Receptor OTLP // // Receptor OTLP
otelcol.receiver.otlp "tempo" { // otelcol.receiver.otlp "tempo" {
grpc { // grpc {
endpoint = "0.0.0.0:4317" // endpoint = "0.0.0.0:4317"
} // }
http { // http {
endpoint = "0.0.0.0:4318" // endpoint = "0.0.0.0:4318"
} // }
output { // output {
traces = [otelcol.processor.batch.tempo.input] // traces = [otelcol.processor.batch.tempo.input]
} // }
} // }
// Receptor Jaeger // // Receptor Jaeger
otelcol.receiver.jaeger "jaeger_traces" { // otelcol.receiver.jaeger "jaeger_traces" {
protocols { // protocols {
grpc { // grpc {
endpoint = "0.0.0.0:14250" // endpoint = "0.0.0.0:14250"
} // }
thrift_http { // thrift_http {
endpoint = "0.0.0.0:14268" // endpoint = "0.0.0.0:14268"
} // }
thrift_compact { // thrift_compact {
endpoint = "0.0.0.0:6831" // endpoint = "0.0.0.0:6831"
} // }
} // }
output { // output {
traces = [otelcol.processor.batch.tempo.input] // traces = [otelcol.processor.batch.tempo.input]
} // }
} // }
// Receptor Zipkin // // Receptor Zipkin
otelcol.receiver.zipkin "zipkin_traces" { // otelcol.receiver.zipkin "zipkin_traces" {
endpoint = "0.0.0.0:9411" // endpoint = "0.0.0.0:9411"
output { // output {
traces = [otelcol.processor.batch.tempo.input] // traces = [otelcol.processor.batch.tempo.input]
} // }
} // }
// Procesador batch // // Procesador batch
otelcol.processor.batch "tempo" { // otelcol.processor.batch "tempo" {
send_batch_size = 1024 // send_batch_size = 1024
timeout = "1s" // timeout = "1s"
output { // output {
traces = [otelcol.exporter.otlp.tempo.input] // traces = [otelcol.exporter.otlp.tempo.input]
} // }
} // }
// Exportador a Tempo // // Exportador a Tempo
otelcol.exporter.otlp "tempo" { // otelcol.exporter.otlp "tempo" {
client { // client {
endpoint = "http://tempo:4317" // endpoint = "http://tempo:4317"
tls { // tls {
insecure = true // insecure = true
} // }
} // }
} // }
//============================================================================= //=============================================================================
// CONFIGURACIÓN GENERAL // CONFIGURACIÓN GENERAL
View File
@@ -0,0 +1,585 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "opacity",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 4,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": true
},
"mappings": [],
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 8,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "100 - (avg by (instance)(irate(node_cpu_seconds_total{job=\"node_exporter\", mode=\"idle\"}[$__rate_interval])) * 100)",
"legendFormat": "{{instance}}",
"range": true,
"refId": "A"
}
],
"title": "Docker Host CPU Usage",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "opacity",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 4,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": true
},
"mappings": [],
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 8,
"x": 8,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "(1 - (node_memory_MemAvailable_bytes{job=\"node_exporter\"} / node_memory_MemTotal_bytes{job=\"node_exporter\"})) * 100",
"legendFormat": "{{instance}}",
"range": true,
"refId": "A"
}
],
"title": "Docker Host Memory Usage",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "opacity",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 4,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": true
},
"mappings": [],
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 8,
"x": 16,
"y": 0
},
"id": 3,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "100 * (1 - (node_filesystem_avail_bytes{job=\"node_exporter\", mountpoint=\"/\", fstype!~\"tmpfs|proc\"} / node_filesystem_size_bytes{job=\"node_exporter\", mountpoint=\"/\", fstype!~\"tmpfs|proc\"}))",
"legendFormat": "rootfs - {{instance}}",
"range": true,
"refId": "A"
}
],
"title": "Root Filesystem Usage",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "opacity",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "smooth",
"lineWidth": 2,
"pointSize": 4,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": true
},
"mappings": [],
"unit": "Bps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 8
},
"id": 4,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "desc"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "sum by (device) (irate(node_network_receive_bytes_total{job=\"node_exporter\", device!~\"lo|docker.*\"}[$__rate_interval]))",
"legendFormat": "RX {{device}}",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "sum by (device) (irate(node_network_transmit_bytes_total{job=\"node_exporter\", device!~\"lo|docker.*\"}[$__rate_interval]))",
"legendFormat": "TX {{device}}",
"range": true,
"refId": "B"
}
],
"title": "Network Traffic",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 0.5
}
]
},
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 8
},
"id": 5,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "prometheus"
},
"editorMode": "code",
"expr": "up{job=~\"prometheus|loki|tempo|grafana|alloy|promtail|node_exporter\"}",
"legendFormat": "{{job}}",
"refId": "A"
}
],
"title": "Stack Services Status",
"type": "stat"
},
{
"datasource": {
"type": "loki",
"uid": "loki"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"lineWidth": 1
},
"mappings": [],
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 16
},
"id": 6,
"options": {
"displayMode": "gradient",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showUnfilled": true,
"text": {}
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "loki"
},
"expr": "sum by (container) (rate({compose_service=~\"$compose_service\", detected_level=~\"ERROR|FATAL|CRITICAL\"}[$__rate_interval]))",
"legendFormat": "{{container}}",
"queryType": "range",
"refId": "A"
}
],
"title": "Error Logs per Container",
"transformations": [],
"type": "bargauge"
},
{
"datasource": {
"type": "loki",
"uid": "loki"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"mappings": [],
"unit": "short"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 16
},
"id": 7,
"options": {
"displayMode": "basic",
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"text": {}
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "loki"
},
"expr": "sum by (container) (rate({compose_service=~\"$compose_service\"}[$__rate_interval]))",
"legendFormat": "{{container}}",
"queryType": "range",
"refId": "A"
}
],
"title": "Total Log Volume per Container",
"transformations": [],
"type": "bargauge"
},
{
"datasource": {
"type": "loki",
"uid": "loki"
},
"fieldConfig": {
"defaults": {
"custom": {}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 25
},
"id": 8,
"options": {
"dedupStrategy": "none",
"enableLogDetails": true,
"prettifyLogMessage": true,
"showCommonLabels": true,
"showLabels": true,
"sortOrder": "Descending",
"wrapLogMessage": true
},
"targets": [
{
"datasource": {
"type": "loki",
"uid": "loki"
},
"expr": "{compose_service=~\"$compose_service\", detected_level=~\"ERROR|FATAL|CRITICAL\"}",
"queryType": "range",
"refId": "A"
}
],
"title": "Latest Error Logs",
"type": "logs"
}
],
"refresh": "30s",
"schemaVersion": 38,
"style": "dark",
"tags": [
"docker",
"logs",
"host"
],
"templating": {
"list": [
{
"allValue": ".*",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": {
"type": "loki",
"uid": "loki"
},
"definition": "label_values({compose_service!=\"\"}, compose_service)",
"hide": 0,
"includeAll": true,
"label": "Compose Service",
"multi": false,
"name": "compose_service",
"options": [],
"query": "label_values({compose_service!=\"\"}, compose_service)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "Docker & Container Logs",
"uid": "docker-monitoring",
"version": 1,
"weekStart": ""
}
+45
View File
@@ -0,0 +1,45 @@
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /var/lib/promtail/positions.yaml
clients:
- url: http://alloy:3101/loki/api/v1/push
batchwait: 1s
batchsize: 1048576
scrape_configs:
- job_name: docker-containers
pipeline_stages:
- docker: {}
- json:
expressions:
timestamp: timestamp
message: message
detected_level: detected_level
detected_value: detected_value
service_name: service_name
- labels:
detected_value:
detected_level:
service_name:
- output:
source: message
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
- source_labels: ["__meta_docker_container_name"]
target_label: "container"
- source_labels: ["__meta_docker_container_image"]
target_label: "image"
- source_labels: ["__meta_docker_container_label_com_docker_compose_service"]
target_label: "compose_service"
- source_labels: ["__meta_docker_container_label_com_docker_compose_service"]
target_label: "job"
- source_labels: ["__meta_docker_container_label_com_docker_compose_project"]
target_label: "compose_project"
- source_labels: ["__meta_docker_container_id"]
target_label: "container_id"
+75 -36
View File
@@ -17,7 +17,7 @@ services:
user: "65534:65534" user: "65534:65534"
networks: networks:
- monitoring - monitoring
restart: unless-stopped restart: always
# Loki para logs # Loki para logs
loki: loki:
@@ -31,43 +31,31 @@ services:
command: -config.file=/etc/loki/local-config.yaml command: -config.file=/etc/loki/local-config.yaml
networks: networks:
- monitoring - monitoring
restart: unless-stopped restart: always
# Tempo para trazas - Solo almacenamiento (Alloy maneja la recolección) # # Tempo para trazas - Solo almacenamiento (Alloy maneja la recolección)
tempo: # tempo:
image: grafana/tempo:latest # image: grafana/tempo:latest
container_name: tempo # container_name: tempo
ports: # ports:
- "${TEMPO_PORT:-3200}:3200" # Solo puerto de consulta, sin endpoints de recepción # - "${TEMPO_PORT:-3200}:3200" # Solo puerto de consulta, sin endpoints de recepción
volumes: # volumes:
- ./config/tempo/tempo.yaml:/etc/tempo/tempo.yaml # - ./config/tempo/tempo.yaml:/etc/tempo/tempo.yaml
- tempo_data:/var/tempo # - tempo_data:/var/tempo
command: # command:
- -config.file=/etc/tempo/tempo.yaml # - -config.file=/etc/tempo/tempo.yaml
networks: # networks:
- monitoring # - monitoring
restart: unless-stopped # restart: always
# Grafana Alloy para colección de datos - ÚNICA PUERTA DE ENTRADA # Grafana Alloy para colección de métricas y logs
alloy: alloy:
image: grafana/alloy:latest image: grafana/alloy:latest
container_name: alloy container_name: alloy
ports: ports:
- "${ALLOY_UI_PORT:-12345}:12345" # Puerto para la UI de Alloy - "${ALLOY_UI_PORT:-12345}:12345" # Puerto para la UI de Alloy
# Puertos para métricas - "${ALLOY_METRICS_PORT:-9999}:9999" # HTTP receiver para métricas externas
- "9999:9999" # HTTP receiver para métricas externas - "${ALLOY_LOGS_HTTP_PORT:-3101}:3101" # HTTP receiver para logs externos
# Puertos para logs
- "3101:3101" # HTTP receiver para logs externos
- "1514:1514" # Syslog TCP receiver
# Puertos para trazas - OpenTelemetry
- "4317:4317" # OTLP gRPC (ya expuesto en tempo, redirigido aquí)
- "4318:4318" # OTLP HTTP (ya expuesto en tempo, redirigido aquí)
# Puertos para trazas - Jaeger compatibility
- "14250:14250" # Jaeger gRPC
- "14268:14268" # Jaeger HTTP
- "6831:6831/udp" # Jaeger compact thrift
# Puertos para trazas - Zipkin compatibility
- "9411:9411" # Zipkin HTTP
volumes: volumes:
- ./config/alloy/alloy.river:/etc/alloy/config.river - ./config/alloy/alloy.river:/etc/alloy/config.river
- /var/run/docker.sock:/var/run/docker.sock:ro - /var/run/docker.sock:/var/run/docker.sock:ro
@@ -83,11 +71,12 @@ services:
- ALLOY_HOSTNAME=${ALLOY_HOSTNAME:-alloy} - ALLOY_HOSTNAME=${ALLOY_HOSTNAME:-alloy}
networks: networks:
- monitoring - monitoring
restart: unless-stopped restart: always
extra_hosts:
- "host.docker.internal:host-gateway"
depends_on: depends_on:
- prometheus - prometheus
- loki - loki
- tempo
# Grafana para visualización # Grafana para visualización
grafana: grafana:
@@ -104,15 +93,64 @@ services:
- GF_PATHS_PROVISIONING=/etc/grafana/provisioning - GF_PATHS_PROVISIONING=/etc/grafana/provisioning
volumes: volumes:
- ./config/grafana/provisioning:/etc/grafana/provisioning - ./config/grafana/provisioning:/etc/grafana/provisioning
- ./config/grafana/dashboards:/var/lib/grafana/dashboards
- grafana_data:/var/lib/grafana - grafana_data:/var/lib/grafana
user: "472:472" user: "472:472"
networks: networks:
- monitoring - monitoring
restart: unless-stopped restart: always
depends_on: depends_on:
- prometheus - prometheus
- loki - loki
- tempo
# # Node Exporter para métricas del host
# node-exporter:
# image: prom/node-exporter:latest
# container_name: node-exporter
# ports:
# - "${NODE_EXPORTER_PORT:-9100}:9100"
# command:
# - '--path.rootfs=/host'
# volumes:
# - /proc:/host/proc:ro
# - /sys:/host/sys:ro
# - /:/host:ro
# networks:
# - monitoring
# restart: always
# # cAdvisor para métricas detalladas de Docker
# cadvisor:
# image: gcr.io/cadvisor/cadvisor:latest
# container_name: cadvisor
# privileged: true
# ports:
# - "${CADVISOR_PORT:-8081}:8080"
# volumes:
# - /:/rootfs:ro
# - /var/run/docker.sock:/var/run/docker.sock:ro
# - /sys:/sys:ro
# - /var/lib/docker/:/var/lib/docker:ro
# - /dev/disk/:/dev/disk:ro
# networks:
# - monitoring
# restart: always
# # Promtail para exportar logs de todos los contenedores hacia Alloy
# promtail:
# image: grafana/promtail:latest
# container_name: promtail
# command: -config.file=/etc/promtail/promtail-config.yml
# volumes:
# - ./config/promtail/promtail-config.yml:/etc/promtail/promtail-config.yml:ro
# - promtail_positions:/var/lib/promtail
# - /var/run/docker.sock:/var/run/docker.sock:ro
# - /var/lib/docker/containers:/var/lib/docker/containers:ro
# networks:
# - monitoring
# restart: always
# depends_on:
# - alloy
networks: networks:
monitoring: monitoring:
@@ -122,6 +160,7 @@ networks:
volumes: volumes:
prometheus_data: prometheus_data:
loki_data: loki_data:
tempo_data:
alloy_data: alloy_data:
grafana_data: grafana_data:
# tempo_data:
# promtail_positions:
+31
View File
@@ -1,5 +1,7 @@
from Logger.LokiLogger import LokiLogger from Logger.LokiLogger import LokiLogger
from Logger import PrometheusMetric
import time
def prueba_log(): def prueba_log():
@@ -31,5 +33,34 @@ def prueba_log():
logger.exception(e) logger.exception(e)
def prueba_metricas(duracion_segundos: int = 180):
"""
Emite métricas de prueba durante un tiempo para permitir que Alloy las scrappee.
"""
metrics = PrometheusMetric(prefix="pruebas", port=9102)
requests_total = metrics.counter(
"requests_total", "Solicitudes de ejemplo", labels=["endpoint"]
)
latency_seconds = metrics.histogram(
"latency_seconds",
"Latencia simulada",
labels=["endpoint"],
buckets=[0.1, 0.5, 1, 2, 5],
)
workers_active = metrics.gauge("workers_active", "Workers activos", labels=["queue"])
start = time.time()
i = 0
while time.time() - start < duracion_segundos:
requests_total.inc(endpoint="/demo")
latency_seconds.observe(0.2 + (i % 5) * 0.1, endpoint="/demo")
workers_active.set(3 + (i % 4), queue="default")
i += 1
time.sleep(1)
if __name__ == "__main__": if __name__ == "__main__":
prueba_log() prueba_log()
# Mantiene el servidor de métricas vivo al menos 3 min para permitir scrapes
prueba_metricas()