feat: media_analytics — ETL PC+VPS → ClickHouse + Grafana

2 ETLs cada 5min suben snapshots (Jellyfin, *arr, Prowlarr, gnula, popelis users/mylist/events) a ClickHouse en el VPS, visualizado en Grafana (grafana.datardos.com). Ingesta PC via tunel SSH; popelis via ETL local en el VPS. Usa clickhouse_insert_rows_py_infra.
2026-05-30 14:55:48 +02:00
commit 52999ecb86
9 changed files with 853 additions and 0 deletions
@@ -0,0 +1,3 @@
+.env
+*.log
+__pycache__/
@@ -0,0 +1,65 @@
+---
+name: media_analytics
+lang: py
+domain: infra
+version: 0.1.0
+description: "Analitica del media stack: 2 ETLs cada 5min suben snapshots (Jellyfin, *arr, Prowlarr, gnula, popelis users/mylist/events) a ClickHouse en el VPS, visualizado en Grafana. Ingesta PC via tunel SSH; popelis via ETL local en VPS."
+tags: [analytics, clickhouse, grafana, etl, media, popelis, jellyfin, service]
+uses_functions:
+  - clickhouse_insert_rows_py_infra
+uses_types: []
+framework: ""
+entry_point: "etl_pc.py"
+dir_path: "apps/media_analytics"
+repo_url: ""
+---
+
+## Arquitectura
+
+```
+PC (Docker Desktop)                         VPS datardos (coolify net)
+─────────────────────                       ──────────────────────────
+Jellyfin :8096  ─┐                           ClickHouse (interno + 127.0.0.1:8123)
+Radarr/Sonarr  ─┤ etl_pc.py (5min) ──ssh──►  analytics.* (11 tablas snapshot)
+Prowlarr       ─┤   tunel SSH 18123→8123          ▲
+gnula_catalog ──┘                                 │ etl_vps.py (5min, root)
+                                            popelis-db (Postgres) ── users/mylist/events
+                                                  │
+                                            Grafana :3000 ──► grafana.datardos.com (Traefik+LE)
+```
+
+## Componentes
+
+| Pieza | Dónde | Qué hace |
+|---|---|---|
+| `etl_pc.py` | PC, systemd-user `media-analytics-etl.timer` (5min) | extrae Jellyfin (items/users/user_items/sessions), Radarr/Sonarr (history/queue), Prowlarr (indexers), gnula SQLite → push a CH via túnel SSH. Usa `clickhouse_insert_rows_py_infra`. |
+| `etl_vps.py` | VPS, systemd `media-analytics-vps.timer` (5min, root) | lee popelis-db (users, mylist snapshot; events incremental por id) → CH HTTP local. Standalone (VPS sin registry). |
+| `deploy/docker-compose.yml` | VPS `/opt/analytics` | ClickHouse (interno coolify + 127.0.0.1:8123) + Grafana (Traefik grafana.datardos.com). |
+| `deploy/clickhouse/schema.sql` | VPS | 11 tablas: jellyfin_{items,users,user_items,sessions}, arr_{history,queue}, prowlarr_indexers, gnula_movies, popelis_{users,mylist,events}. |
+| `deploy/grafana/provisioning/` | VPS | datasource ClickHouse (uid `clickhouse`) + dashboard `Media Stack Analytics` (12 paneles). |
+
+## Secretos
+- `pass datardos-vps/clickhouse` (user analytics) · `pass datardos-vps/grafana` (admin).
+- PC: `~/.config/popelis/analytics.env` (chmod600; CH pass + JF/arr keys — el timer no usa GPG).
+- VPS: `/opt/analytics/.env` (chmod600; CH_PASSWORD, GF_PASSWORD).
+
+## Ejecutar manual
+```bash
+# PC ETL
+/home/lucas/fn_registry/python/.venv/bin/python3 apps/media_analytics/etl_pc.py        # real
+/home/lucas/fn_registry/python/.venv/bin/python3 apps/media_analytics/etl_pc.py --dry  # solo extrae
+# VPS ETL
+ssh datardos 'sudo python3 /opt/analytics/etl_vps.py'
+# Redeploy infra VPS
+rsync -az apps/media_analytics/deploy/ datardos:/opt/analytics/ && ssh datardos 'cd /opt/analytics && sudo docker compose up -d'
+```
+
+## Visualización
+https://grafana.datardos.com (admin / `pass datardos-vps/grafana`). Dashboard "Media Stack Analytics".
+
+## Gotchas
+- **Eventos "play" NO van por popelis** (la reproducción es directa a Jellyfin `/jf`): se capturan del lado Jellyfin (`jellyfin_sessions` + `jellyfin_user_items.play_count`). `popelis_events` cubre login/logout/mylist/user_created (instrumentado en popelis-api).
+- ClickHouse HTTP escucha **solo 127.0.0.1 del VPS** (no público). El PC entra por túnel SSH efímero (`ssh -N -L 18123:127.0.0.1:8123`). Grafana usa el nativo :9000 por la red coolify.
+- Snapshots son **append con snapshot_ts** → análisis temporal del estado. Eventos son hechos (event_ts) con dedup `ReplacingMergeTree(event_id)`.
+- Int64 de ClickHouse vuelve como **string** en JSON (gotcha de `clickhouse_query`/Grafana).
+- El timer del PC necesita `ssh datardos` sin passphrase (key sin passphrase o agente cargado).
@@ -0,0 +1,173 @@
+-- Esquema analitico media stack. Todas las tablas de snapshot llevan snapshot_ts
+-- (momento de la captura del ETL, cada 5min) → permite analisis temporal del estado.
+-- Las tablas de eventos llevan event_ts (instante real del evento).
+-- Engine MergeTree, particion mensual, orden por (snapshot_ts, clave).
+
+CREATE DATABASE IF NOT EXISTS analytics;
+
+-- ============ JELLYFIN ============
+-- Catalogo: peliculas/series/episodios visibles en la biblioteca.
+CREATE TABLE IF NOT EXISTS analytics.jellyfin_items (
+    snapshot_ts      DateTime,
+    item_id          String,
+    type             LowCardinality(String),   -- Movie | Series | Episode
+    name             String,
+    production_year  Int32,
+    runtime_min      Float32,
+    genres           Array(String),
+    community_rating Float32,
+    official_rating  String,
+    series_name      String,
+    library          String,
+    path             String,
+    date_created     DateTime DEFAULT toDateTime(0)
+) ENGINE = MergeTree
+PARTITION BY toYYYYMM(snapshot_ts)
+ORDER BY (snapshot_ts, type, item_id);
+
+-- Usuarios Jellyfin (espejos popelis).
+CREATE TABLE IF NOT EXISTS analytics.jellyfin_users (
+    snapshot_ts    DateTime,
+    user_id        String,
+    name           String,
+    last_login     DateTime DEFAULT toDateTime(0),
+    last_activity  DateTime DEFAULT toDateTime(0),
+    is_admin       UInt8
+) ENGINE = MergeTree
+PARTITION BY toYYYYMM(snapshot_ts)
+ORDER BY (snapshot_ts, user_id);
+
+-- Estado de reproduccion por usuario+item (playcount, visto, ultima vez).
+CREATE TABLE IF NOT EXISTS analytics.jellyfin_user_items (
+    snapshot_ts   DateTime,
+    user_id       String,
+    user_name     String,
+    item_id       String,
+    item_name     String,
+    type          LowCardinality(String),
+    played        UInt8,
+    play_count    Int32,
+    playback_pct  Float32,
+    last_played   DateTime DEFAULT toDateTime(0)
+) ENGINE = MergeTree
+PARTITION BY toYYYYMM(snapshot_ts)
+ORDER BY (snapshot_ts, user_id, item_id);
+
+-- Sesiones activas (lo que se esta viendo en el momento del snapshot).
+CREATE TABLE IF NOT EXISTS analytics.jellyfin_sessions (
+    snapshot_ts   DateTime,
+    user_name     String,
+    item_name     String,
+    item_type     LowCardinality(String),
+    client        String,
+    device        String,
+    play_method   String,
+    is_paused     UInt8,
+    position_pct  Float32
+) ENGINE = MergeTree
+PARTITION BY toYYYYMM(snapshot_ts)
+ORDER BY (snapshot_ts, user_name);
+
+-- ============ SCRAPERS TORRENTS (*arr) ============
+-- Historial Radarr/Sonarr: grabs, imports, fallos.
+CREATE TABLE IF NOT EXISTS analytics.arr_history (
+    snapshot_ts     DateTime,
+    app             LowCardinality(String),   -- radarr | sonarr
+    history_id      Int64,
+    event_type      LowCardinality(String),   -- grabbed | downloadFolderImported | ...
+    title           String,
+    source_title    String,
+    indexer         String,
+    download_client String,
+    quality         String,
+    languages       Array(String),
+    event_date      DateTime DEFAULT toDateTime(0)
+) ENGINE = MergeTree
+PARTITION BY toYYYYMM(snapshot_ts)
+ORDER BY (snapshot_ts, app, history_id);
+
+-- Cola activa de descargas.
+CREATE TABLE IF NOT EXISTS analytics.arr_queue (
+    snapshot_ts   DateTime,
+    app           LowCardinality(String),
+    title         String,
+    status        String,
+    tracked_status String,
+    size_bytes    Int64,
+    sizeleft_bytes Int64,
+    timeleft      String,
+    indexer       String,
+    download_client String
+) ENGINE = MergeTree
+PARTITION BY toYYYYMM(snapshot_ts)
+ORDER BY (snapshot_ts, app, title);
+
+-- Indexers Prowlarr: estado + contadores grab/query.
+CREATE TABLE IF NOT EXISTS analytics.prowlarr_indexers (
+    snapshot_ts   DateTime,
+    indexer_id    Int32,
+    name          String,
+    enable        UInt8,
+    protocol      String,
+    privacy       String,
+    num_grabs     Int64,
+    num_queries   Int64,
+    num_grab_fail Int64,
+    num_query_fail Int64
+) ENGINE = MergeTree
+PARTITION BY toYYYYMM(snapshot_ts)
+ORDER BY (snapshot_ts, indexer_id);
+
+-- ============ GNULA SCRAPPER ============
+-- Catalogo de pelis en castellano detectadas (gnula_catalog.db).
+CREATE TABLE IF NOT EXISTS analytics.gnula_movies (
+    snapshot_ts   DateTime,
+    href          String,
+    title         String,
+    year          Int32,
+    flags         String,
+    lang_es       UInt8,
+    status        LowCardinality(String),   -- pending | downloaded | failed | have
+    in_library    UInt8,
+    detected_at   String,
+    downloaded_at String
+) ENGINE = MergeTree
+PARTITION BY toYYYYMM(snapshot_ts)
+ORDER BY (snapshot_ts, href);
+
+-- ============ POPELIS ============
+-- Usuarios (estado).
+CREATE TABLE IF NOT EXISTS analytics.popelis_users (
+    snapshot_ts   DateTime,
+    user_id       Int64,
+    username      String,
+    jf_user_id    String,
+    created_at    DateTime DEFAULT toDateTime(0)
+) ENGINE = MergeTree
+PARTITION BY toYYYYMM(snapshot_ts)
+ORDER BY (snapshot_ts, user_id);
+
+-- Mi lista por usuario (estado).
+CREATE TABLE IF NOT EXISTS analytics.popelis_mylist (
+    snapshot_ts   DateTime,
+    user_id       Int64,
+    item_id       String,
+    added_at      DateTime DEFAULT toDateTime(0)
+) ENGINE = MergeTree
+PARTITION BY toYYYYMM(snapshot_ts)
+ORDER BY (snapshot_ts, user_id, item_id);
+
+-- Eventos (logins, plays, mylist add/remove) — instrumentados en popelis-api.
+-- Tabla de hechos: dedup por event_id con ReplacingMergeTree.
+CREATE TABLE IF NOT EXISTS analytics.popelis_events (
+    event_id      Int64,
+    event_ts      DateTime,
+    user_id       Int64,
+    username      String,
+    event_type    LowCardinality(String),   -- login | logout | play | mylist_add | mylist_remove
+    item_id       String,
+    meta          String,
+    ingested_at   DateTime DEFAULT now()
+) ENGINE = ReplacingMergeTree(ingested_at)
+PARTITION BY toYYYYMM(event_ts)
+ORDER BY (event_id);
@@ -0,0 +1,60 @@
+services:
+  clickhouse:
+    image: clickhouse/clickhouse-server:24.8-alpine
+    container_name: clickhouse
+    restart: always
+    environment:
+      CLICKHOUSE_DB: analytics
+      CLICKHOUSE_USER: analytics
+      CLICKHOUSE_PASSWORD: ${CH_PASSWORD}
+      CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: "1"
+    ulimits:
+      nofile:
+        soft: 262144
+        hard: 262144
+    volumes:
+      - clickhouse_data:/var/lib/clickhouse
+      - ./clickhouse/schema.sql:/docker-entrypoint-initdb.d/01_schema.sql:ro
+    networks:
+      - coolify
+    ports:
+      # HTTP solo en localhost del VPS (no publico). Ingesta del PC via tunel SSH.
+      # Grafana usa el nativo 9000 por la red coolify (no expuesto).
+      - "127.0.0.1:8123:8123"
+    deploy:
+      resources:
+        limits:
+          memory: 2g
+
+  grafana:
+    image: grafana/grafana:11.2.0
+    container_name: grafana
+    restart: always
+    environment:
+      GF_SECURITY_ADMIN_USER: admin
+      GF_SECURITY_ADMIN_PASSWORD: ${GF_PASSWORD}
+      GF_INSTALL_PLUGINS: grafana-clickhouse-datasource
+      GF_SERVER_ROOT_URL: https://grafana.datardos.com
+      GF_USERS_ALLOW_SIGN_UP: "false"
+      CH_PASSWORD: ${CH_PASSWORD}
+    volumes:
+      - grafana_data:/var/lib/grafana
+      - ./grafana/provisioning:/etc/grafana/provisioning:ro
+    networks:
+      - coolify
+    labels:
+      traefik.enable: "true"
+      traefik.docker.network: coolify
+      traefik.http.routers.grafana.entrypoints: https
+      traefik.http.routers.grafana.rule: Host(`grafana.datardos.com`)
+      traefik.http.routers.grafana.tls: "true"
+      traefik.http.routers.grafana.tls.certresolver: letsencrypt
+      traefik.http.services.grafana.loadbalancer.server.port: "3000"
+
+volumes:
+  clickhouse_data:
+  grafana_data:
+
+networks:
+  coolify:
+    external: true
@@ -0,0 +1,13 @@
+apiVersion: 1
+
+providers:
+  - name: media-stack
+    orgId: 1
+    folder: Media Stack
+    type: file
+    disableDeletion: false
+    updateIntervalSeconds: 30
+    allowUiUpdates: true
+    options:
+      path: /etc/grafana/provisioning/dashboards
+      foldersFromFilesStructure: false
@@ -0,0 +1,88 @@
+{
+  "uid": "media-stack",
+  "title": "Media Stack Analytics",
+  "tags": ["media", "popelis"],
+  "timezone": "browser",
+  "schemaVersion": 39,
+  "version": 1,
+  "refresh": "5m",
+  "time": { "from": "now-7d", "to": "now" },
+  "templating": { "list": [] },
+  "annotations": { "list": [] },
+  "panels": [
+    {
+      "id": 1, "type": "stat", "title": "Jellyfin · items (último)",
+      "gridPos": { "h": 4, "w": 4, "x": 0, "y": 0 },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT count() AS items FROM analytics.jellyfin_items WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.jellyfin_items)" } ]
+    },
+    {
+      "id": 2, "type": "stat", "title": "Popelis · usuarios",
+      "gridPos": { "h": 4, "w": 4, "x": 4, "y": 0 },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT count() AS users FROM analytics.popelis_users WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.popelis_users)" } ]
+    },
+    {
+      "id": 3, "type": "stat", "title": "gnula · pendientes",
+      "gridPos": { "h": 4, "w": 4, "x": 8, "y": 0 },
+      "fieldConfig": { "defaults": { "color": { "mode": "fixed", "fixedColor": "orange" } }, "overrides": [] },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT countIf(status='pending') AS pending FROM analytics.gnula_movies WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.gnula_movies)" } ]
+    },
+    {
+      "id": 4, "type": "stat", "title": "gnula · descargadas",
+      "gridPos": { "h": 4, "w": 4, "x": 12, "y": 0 },
+      "fieldConfig": { "defaults": { "color": { "mode": "fixed", "fixedColor": "green" } }, "overrides": [] },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT countIf(status='downloaded') AS downloaded FROM analytics.gnula_movies WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.gnula_movies)" } ]
+    },
+    {
+      "id": 5, "type": "stat", "title": "*arr · grabs (total)",
+      "gridPos": { "h": 4, "w": 4, "x": 16, "y": 0 },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT countIf(event_type='grabbed') AS grabs FROM analytics.arr_history WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.arr_history)" } ]
+    },
+    {
+      "id": 6, "type": "stat", "title": "Jellyfin · sesiones activas (último)",
+      "gridPos": { "h": 4, "w": 4, "x": 20, "y": 0 },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT count() AS sesiones FROM analytics.jellyfin_sessions WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.jellyfin_sessions)" } ]
+    },
+    {
+      "id": 10, "type": "timeseries", "title": "gnula · catálogo en el tiempo",
+      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 4 },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "timeseries", "format": 0, "rawSql": "SELECT snapshot_ts AS time, countIf(status='pending') AS pendientes, countIf(status='downloaded') AS descargadas, countIf(in_library=1) AS en_biblioteca FROM analytics.gnula_movies GROUP BY time ORDER BY time" } ]
+    },
+    {
+      "id": 11, "type": "timeseries", "title": "Jellyfin · tamaño biblioteca en el tiempo",
+      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 4 },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "timeseries", "format": 0, "rawSql": "SELECT snapshot_ts AS time, countIf(type='Movie') AS peliculas, countIf(type='Series') AS series, countIf(type='Episode') AS episodios FROM analytics.jellyfin_items GROUP BY time ORDER BY time" } ]
+    },
+    {
+      "id": 20, "type": "table", "title": "*arr · grabs recientes",
+      "gridPos": { "h": 9, "w": 12, "x": 0, "y": 12 },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT event_date, app, title, indexer, quality, arrayStringConcat(languages, ',') AS idiomas FROM analytics.arr_history WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.arr_history) AND event_type='grabbed' ORDER BY event_date DESC LIMIT 30" } ]
+    },
+    {
+      "id": 21, "type": "table", "title": "Prowlarr · indexers",
+      "gridPos": { "h": 9, "w": 12, "x": 12, "y": 12 },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT name, enable, protocol, num_grabs, num_queries, num_grab_fail, num_query_fail FROM analytics.prowlarr_indexers WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.prowlarr_indexers) ORDER BY num_grabs DESC" } ]
+    },
+    {
+      "id": 30, "type": "table", "title": "Popelis · eventos recientes",
+      "gridPos": { "h": 9, "w": 12, "x": 0, "y": 21 },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT event_ts, username, event_type, item_id FROM analytics.popelis_events ORDER BY event_ts DESC LIMIT 50" } ]
+    },
+    {
+      "id": 31, "type": "timeseries", "title": "Popelis · eventos por tipo (por día)",
+      "gridPos": { "h": 9, "w": 12, "x": 12, "y": 21 },
+      "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
+      "targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "timeseries", "format": 0, "rawSql": "SELECT toStartOfDay(event_ts) AS time, countIf(event_type='login') AS logins, countIf(event_type='mylist_add') AS mylist_add, countIf(event_type='user_created') AS altas FROM analytics.popelis_events GROUP BY time ORDER BY time" } ]
+    }
+  ]
+}
@@ -0,0 +1,16 @@
+apiVersion: 1
+
+datasources:
+  - name: ClickHouse
+    uid: clickhouse
+    type: grafana-clickhouse-datasource
+    access: proxy
+    isDefault: true
+    jsonData:
+      host: clickhouse
+      port: 9000
+      protocol: native
+      username: analytics
+      defaultDatabase: analytics
+    secureJsonData:
+      password: ${CH_PASSWORD}
@@ -0,0 +1,296 @@
+#!/usr/bin/env python3
+"""ETL PC → ClickHouse (cada 5min). Extrae las fuentes que viven en este PC:
+Jellyfin (catalogo, usuarios, reproduccion, sesiones), Radarr/Sonarr (history+queue),
+Prowlarr (indexers), y el catalogo gnula (SQLite). Empuja snapshots con snapshot_ts a
+ClickHouse del VPS a traves de un tunel SSH (CH HTTP escucha solo en 127.0.0.1 del VPS).
+
+Reusa funciones del registry: clickhouse_insert_rows_py_infra.
+Secrets en ~/.config/popelis/analytics.env (chmod600; el timer no puede usar pass/GPG).
+
+Uso: python etl_pc.py [--once] [--dry]
+"""
+import json
+import os
+import re
+import sqlite3
+import subprocess
+import sys
+import time
+import urllib.request
+import urllib.parse
+import urllib.error
+from datetime import datetime, timezone
+
+# --- registry ---
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "python", "functions"))
+try:
+    from infra import clickhouse_insert_rows
+    if not callable(clickhouse_insert_rows):
+        raise ImportError
+except ImportError:
+    from infra.clickhouse_insert_rows import clickhouse_insert_rows  # noqa: E402
+
+ENV_PATH = os.path.expanduser("~/.config/popelis/analytics.env")
+
+
+def load_env(path):
+    cfg = {}
+    with open(path) as f:
+        for line in f:
+            line = line.strip()
+            if line and not line.startswith("#") and "=" in line:
+                k, v = line.split("=", 1)
+                cfg[k] = v
+    return cfg
+
+
+CFG = load_env(ENV_PATH)
+DRY = "--dry" in sys.argv
+LOCAL_PORT = 18123
+SNAP = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
+
+
+def http_json(url, headers=None, data=None, method="GET", timeout=20):
+    req = urllib.request.Request(url, data=data, headers=headers or {}, method=method)
+    with urllib.request.urlopen(req, timeout=timeout) as r:
+        return json.loads(r.read())
+
+
+def ticks_to_min(t):
+    try:
+        return round(float(t) / 600_000_000.0, 2)  # 1 tick = 100ns
+    except Exception:
+        return 0.0
+
+
+def fmt_dt(s):
+    """Jellyfin/ISO → 'YYYY-MM-DD HH:MM:SS' o '' ."""
+    if not s:
+        return None
+    s = str(s).replace("Z", "").split(".")[0].replace("T", " ")
+    return s[:19] if len(s) >= 19 else None
+
+
+# ---------- TUNEL SSH ----------
+class Tunnel:
+    def __init__(self, host, lport, rhost="127.0.0.1", rport=8123):
+        self.host, self.lport, self.rhost, self.rport = host, lport, rhost, rport
+        self.proc = None
+
+    def __enter__(self):
+        self.proc = subprocess.Popen(
+            ["ssh", "-N", "-o", "ExitOnForwardFailure=yes", "-o", "ConnectTimeout=10",
+             "-L", f"{self.lport}:{self.rhost}:{self.rport}", self.host],
+            stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
+        # esperar puerto
+        for _ in range(30):
+            try:
+                urllib.request.urlopen(f"http://127.0.0.1:{self.lport}/ping", timeout=2).read()
+                return self
+            except Exception:
+                time.sleep(0.3)
+        raise RuntimeError("tunel SSH no abrio puerto local a tiempo")
+
+    def __exit__(self, *a):
+        if self.proc:
+            self.proc.terminate()
+
+
+# ---------- EXTRACTORES ----------
+def jf_auth():
+    body = json.dumps({"Username": CFG["JF_USER"], "Pw": CFG["JF_PASS"]}).encode()
+    hdr = {"Content-Type": "application/json",
+           "X-Emby-Authorization": 'MediaBrowser Client="etl", Device="pc", DeviceId="etl-pc", Version="1.0"'}
+    d = http_json(f'{CFG["JF_URL"]}/Users/AuthenticateByName', hdr, body, "POST")
+    return d["AccessToken"]
+
+
+def jf_get(token, path, params=""):
+    url = f'{CFG["JF_URL"]}{path}'
+    if params:
+        url += ("&" if "?" in path else "?") + params
+    return http_json(url, {"X-Emby-Token": token}, timeout=40)
+
+
+def extract_jellyfin():
+    out = {"jellyfin_items": [], "jellyfin_users": [], "jellyfin_user_items": [],
+           "jellyfin_sessions": []}
+    token = jf_auth()
+    # items
+    fields = "Genres,Path,RunTimeTicks,ProductionYear,CommunityRating,OfficialRating,DateCreated,SeriesName"
+    d = jf_get(token, "/Items", f"Recursive=true&IncludeItemTypes=Movie,Series,Episode&Fields={fields}")
+    for it in d.get("Items", []):
+        out["jellyfin_items"].append({
+            "snapshot_ts": SNAP, "item_id": it.get("Id", ""), "type": it.get("Type", ""),
+            "name": it.get("Name", ""), "production_year": it.get("ProductionYear") or 0,
+            "runtime_min": ticks_to_min(it.get("RunTimeTicks", 0)),
+            "genres": it.get("Genres", []) or [],
+            "community_rating": it.get("CommunityRating") or 0.0,
+            "official_rating": it.get("OfficialRating", "") or "",
+            "series_name": it.get("SeriesName", "") or "", "library": "",
+            "path": it.get("Path", "") or "",
+            "date_created": fmt_dt(it.get("DateCreated")) or "1970-01-01 00:00:00",
+        })
+    # users
+    users = jf_get(token, "/Users")
+    for u in users:
+        out["jellyfin_users"].append({
+            "snapshot_ts": SNAP, "user_id": u.get("Id", ""), "name": u.get("Name", ""),
+            "last_login": fmt_dt(u.get("LastLoginDate")) or "1970-01-01 00:00:00",
+            "last_activity": fmt_dt(u.get("LastActivityDate")) or "1970-01-01 00:00:00",
+            "is_admin": 1 if u.get("Policy", {}).get("IsAdministrator") else 0,
+        })
+        # played items por usuario (Movie+Episode vistos)
+        try:
+            pi = jf_get(token, f'/Users/{u["Id"]}/Items',
+                        "Recursive=true&IncludeItemTypes=Movie,Episode&IsPlayed=true&Fields=UserData")
+            for it in pi.get("Items", []):
+                ud = it.get("UserData", {}) or {}
+                out["jellyfin_user_items"].append({
+                    "snapshot_ts": SNAP, "user_id": u.get("Id", ""), "user_name": u.get("Name", ""),
+                    "item_id": it.get("Id", ""), "item_name": it.get("Name", ""),
+                    "type": it.get("Type", ""), "played": 1 if ud.get("Played") else 0,
+                    "play_count": ud.get("PlayCount", 0) or 0,
+                    "playback_pct": round(ud.get("PlayedPercentage", 0.0) or 0.0, 2),
+                    "last_played": fmt_dt(ud.get("LastPlayedDate")) or "1970-01-01 00:00:00",
+                })
+        except Exception as e:
+            print(f"[jf] user_items {u.get('Name')}: {e}", file=sys.stderr)
+    # sesiones activas
+    try:
+        for s in jf_get(token, "/Sessions"):
+            np = s.get("NowPlayingItem")
+            if not np:
+                continue
+            ps = s.get("PlayState", {}) or {}
+            pos = ticks_to_min(ps.get("PositionTicks", 0))
+            dur = ticks_to_min(np.get("RunTimeTicks", 0)) or 1
+            out["jellyfin_sessions"].append({
+                "snapshot_ts": SNAP, "user_name": s.get("UserName", ""),
+                "item_name": np.get("Name", ""), "item_type": np.get("Type", ""),
+                "client": s.get("Client", ""), "device": s.get("DeviceName", ""),
+                "play_method": ps.get("PlayMethod", ""), "is_paused": 1 if ps.get("IsPaused") else 0,
+                "position_pct": round(100.0 * pos / dur, 2),
+            })
+    except Exception as e:
+        print(f"[jf] sessions: {e}", file=sys.stderr)
+    return out
+
+
+def arr_get(base, key, path, ver="v3"):
+    sep = "&" if "?" in path else "?"
+    return http_json(f"{base}/api/{ver}/{path}{sep}apikey={key}", timeout=30)
+
+
+def extract_arr():
+    out = {"arr_history": [], "arr_queue": []}
+    for app, base, key in [("radarr", CFG["RADARR_URL"], CFG["RADARR_KEY"]),
+                           ("sonarr", CFG["SONARR_URL"], CFG["SONARR_KEY"])]:
+        try:
+            h = arr_get(base, key, "history?page=1&pageSize=200&sortKey=date&sortDirection=descending")
+            for r in h.get("records", []):
+                out["arr_history"].append({
+                    "snapshot_ts": SNAP, "app": app, "history_id": r.get("id", 0),
+                    "event_type": r.get("eventType", ""),
+                    "title": (r.get("movie", {}) or r.get("series", {}) or {}).get("title", "") or "",
+                    "source_title": r.get("sourceTitle", "") or "",
+                    "indexer": (r.get("data", {}) or {}).get("indexer", "") or "",
+                    "download_client": (r.get("data", {}) or {}).get("downloadClient", "") or "",
+                    "quality": (r.get("quality", {}) or {}).get("quality", {}).get("name", "") or "",
+                    "languages": [l.get("name", "") for l in (r.get("languages", []) or [])],
+                    "event_date": fmt_dt(r.get("date")) or "1970-01-01 00:00:00",
+                })
+        except Exception as e:
+            print(f"[arr] {app} history: {e}", file=sys.stderr)
+        try:
+            q = arr_get(base, key, "queue?page=1&pageSize=100")
+            for r in q.get("records", []):
+                out["arr_queue"].append({
+                    "snapshot_ts": SNAP, "app": app, "title": r.get("title", "") or "",
+                    "status": r.get("status", "") or "",
+                    "tracked_status": r.get("trackedDownloadState", "") or "",
+                    "size_bytes": int(r.get("size", 0) or 0),
+                    "sizeleft_bytes": int(r.get("sizeleft", 0) or 0),
+                    "timeleft": r.get("timeleft", "") or "",
+                    "indexer": r.get("indexer", "") or "",
+                    "download_client": r.get("downloadClient", "") or "",
+                })
+        except Exception as e:
+            print(f"[arr] {app} queue: {e}", file=sys.stderr)
+    return out
+
+
+def extract_prowlarr():
+    out = {"prowlarr_indexers": []}
+    try:
+        idx = arr_get(CFG["PROWLARR_URL"], CFG["PROWLARR_KEY"], "indexer", ver="v1")
+        stats = {}
+        try:
+            st = arr_get(CFG["PROWLARR_URL"], CFG["PROWLARR_KEY"], "indexerstats", ver="v1")
+            for s in st.get("indexers", []):
+                stats[s.get("indexerId")] = s
+        except Exception:
+            pass
+        for i in idx:
+            s = stats.get(i.get("id"), {})
+            out["prowlarr_indexers"].append({
+                "snapshot_ts": SNAP, "indexer_id": i.get("id", 0), "name": i.get("name", "") or "",
+                "enable": 1 if i.get("enable") else 0, "protocol": i.get("protocol", "") or "",
+                "privacy": i.get("privacy", "") or "",
+                "num_grabs": s.get("numberOfGrabs", 0) or 0,
+                "num_queries": s.get("numberOfQueries", 0) or 0,
+                "num_grab_fail": s.get("numberOfFailedGrabs", 0) or 0,
+                "num_query_fail": s.get("numberOfFailedQueries", 0) or 0,
+            })
+    except Exception as e:
+        print(f"[prowlarr] {e}", file=sys.stderr)
+    return out
+
+
+def extract_gnula():
+    out = {"gnula_movies": []}
+    db = CFG.get("GNULA_DB", "")
+    if not (db and os.path.exists(db)):
+        return out
+    c = sqlite3.connect(db)
+    c.row_factory = sqlite3.Row
+    for r in c.execute("SELECT href,title,year,flags,lang_es,status,in_library,detected_at,downloaded_at FROM movies"):
+        out["gnula_movies"].append({
+            "snapshot_ts": SNAP, "href": r["href"] or "", "title": r["title"] or "",
+            "year": r["year"] or 0, "flags": r["flags"] or "", "lang_es": r["lang_es"] or 0,
+            "status": r["status"] or "", "in_library": r["in_library"] or 0,
+            "detected_at": r["detected_at"] or "", "downloaded_at": r["downloaded_at"] or "",
+        })
+    c.close()
+    return out
+
+
+def main():
+    data = {}
+    for fn in (extract_jellyfin, extract_arr, extract_prowlarr, extract_gnula):
+        try:
+            data.update(fn())
+        except Exception as e:
+            print(f"[etl] {fn.__name__} FALLO: {e}", file=sys.stderr)
+    counts = {t: len(rows) for t, rows in data.items()}
+    print(f"[etl] snapshot {SNAP} extraido: {json.dumps(counts)}")
+    if DRY:
+        print("[etl] --dry: no se inserta"); return
+    base = f"http://127.0.0.1:{LOCAL_PORT}"
+    total = 0
+    with Tunnel(CFG["SSH_HOST"], LOCAL_PORT):
+        for table, rows in data.items():
+            if not rows:
+                continue
+            try:
+                n = clickhouse_insert_rows(base, f'{CFG["CH_DB"]}.{table}', rows,
+                                           user=CFG["CH_USER"], password=CFG["CH_PASSWORD"],
+                                           database=CFG["CH_DB"])
+                total += n
+            except Exception as e:
+                print(f"[etl] insert {table} FALLO: {e}", file=sys.stderr)
+    print(json.dumps({"snapshot_ts": SNAP, "inserted": total, "tables": counts}))
+
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,139 @@
+#!/usr/bin/env python3
+"""ETL VPS → ClickHouse (cada 5min). Corre EN el VPS datardos (no en el PC) porque
+popelis-db (Postgres) solo es alcanzable en la red coolify. Lee popelis-db via
+`docker exec popelis-db psql` y empuja a ClickHouse por su HTTP local (127.0.0.1:8123).
+
+Standalone a proposito: el VPS no tiene el registry fn_registry checkouteado, asi que
+no importa clickhouse_insert_rows_py_infra — replica el POST JSONEachRow minimal.
+
+- users, mylist: snapshot completo cada run (snapshot_ts).
+- events: incremental por id (> max(event_id) ya en CH; ReplacingMergeTree dedup).
+
+Lee creds de /opt/analytics/.env (CH_PASSWORD). Pensado para systemd timer en el VPS.
+Uso: sudo python3 etl_vps.py
+"""
+import json
+import subprocess
+import sys
+import urllib.request
+import urllib.parse
+import urllib.error
+from datetime import datetime, timezone
+
+ENV = "/opt/analytics/.env"
+CH_URL = "http://127.0.0.1:8123"
+CH_USER = "analytics"
+CH_DB = "analytics"
+PG_CONTAINER = "popelis-db"
+PG_USER = "popelis"
+PG_DB = "popelis"
+SNAP = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
+
+
+def ch_password():
+    with open(ENV) as f:
+        for line in f:
+            if line.startswith("CH_PASSWORD="):
+                return line.strip().split("=", 1)[1]
+    raise RuntimeError("CH_PASSWORD no en " + ENV)
+
+
+CH_PASS = ch_password()
+
+
+def pg_json(sql):
+    """Ejecuta SQL en popelis-db y devuelve list[dict] (via json_agg)."""
+    wrapped = f"SELECT COALESCE(json_agg(t), '[]') FROM ({sql}) t"
+    out = subprocess.check_output(
+        ["docker", "exec", PG_CONTAINER, "psql", "-U", PG_USER, "-d", PG_DB,
+         "-tAc", wrapped], text=True)
+    return json.loads(out.strip() or "[]")
+
+
+def ch_query(sql):
+    url = f"{CH_URL}/?{urllib.parse.urlencode({'database': CH_DB, 'default_format': 'JSONEachRow'})}"
+    req = urllib.request.Request(url, data=sql.encode(),
+                                 headers={"X-ClickHouse-User": CH_USER, "X-ClickHouse-Key": CH_PASS})
+    with urllib.request.urlopen(req, timeout=30) as r:
+        body = r.read().decode()
+    return [json.loads(l) for l in body.splitlines() if l.strip()]
+
+
+def ch_insert(table, rows):
+    if not rows:
+        return 0
+    body = "\n".join(json.dumps(r, separators=(",", ":")) for r in rows).encode()
+    q = f"INSERT INTO {CH_DB}.{table} FORMAT JSONEachRow"
+    url = f"{CH_URL}/?{urllib.parse.urlencode({'database': CH_DB, 'query': q})}"
+    req = urllib.request.Request(url, data=body,
+                                 headers={"X-ClickHouse-User": CH_USER, "X-ClickHouse-Key": CH_PASS,
+                                          "Content-Type": "text/plain"})
+    try:
+        urllib.request.urlopen(req, timeout=30).read()
+    except urllib.error.HTTPError as e:
+        raise ValueError(f"CH {e.code}: {e.read()[:300]}")
+    return len(rows)
+
+
+def fmt_ts(s):
+    if not s:
+        return "1970-01-01 00:00:00"
+    return str(s).replace("T", " ").split(".")[0].split("+")[0][:19]
+
+
+def main():
+    total = 0
+    # users (snapshot)
+    users = pg_json("SELECT id AS user_id, username, jf_user_id, "
+                    "to_char(COALESCE(created_at, now()),'YYYY-MM-DD HH24:MI:SS') AS created_at "
+                    "FROM users") if has_col("users", "created_at") else \
+        pg_json("SELECT id AS user_id, username, jf_user_id FROM users")
+    for u in users:
+        u["snapshot_ts"] = SNAP
+        u.setdefault("created_at", "1970-01-01 00:00:00")
+        u["created_at"] = fmt_ts(u["created_at"])
+    total += ch_insert("popelis_users", users)
+
+    # mylist (snapshot)
+    ml = pg_json("SELECT user_id, item_id, "
+                 "to_char(COALESCE(added_at, now()),'YYYY-MM-DD HH24:MI:SS') AS added_at FROM mylist") \
+        if has_col("mylist", "added_at") else \
+        pg_json("SELECT user_id, item_id FROM mylist")
+    for m in ml:
+        m["snapshot_ts"] = SNAP
+        m.setdefault("added_at", "1970-01-01 00:00:00")
+        m["added_at"] = fmt_ts(m["added_at"])
+    total += ch_insert("popelis_mylist", ml)
+
+    # events (incremental por id)
+    last = 0
+    try:
+        r = ch_query("SELECT max(event_id) AS m FROM popelis_events")
+        last = int(r[0]["m"]) if r and r[0].get("m") not in (None, "") else 0
+    except Exception:
+        last = 0
+    ev = pg_json(f"SELECT id AS event_id, "
+                 f"to_char(ts,'YYYY-MM-DD HH24:MI:SS') AS event_ts, "
+                 f"COALESCE(user_id,0) AS user_id, username, event_type, item_id, meta "
+                 f"FROM events WHERE id > {last} ORDER BY id")
+    for e in ev:
+        e["event_ts"] = fmt_ts(e["event_ts"])
+    total += ch_insert("popelis_events", ev)
+
+    print(json.dumps({"snapshot_ts": SNAP, "users": len(users), "mylist": len(ml),
+                      "events_new": len(ev), "inserted": total}))
+
+
+def has_col(table, col):
+    try:
+        out = subprocess.check_output(
+            ["docker", "exec", PG_CONTAINER, "psql", "-U", PG_USER, "-d", PG_DB, "-tAc",
+             f"SELECT 1 FROM information_schema.columns WHERE table_name='{table}' AND column_name='{col}'"],
+            text=True).strip()
+        return out == "1"
+    except Exception:
+        return False
+
+
+if __name__ == "__main__":
+    main()