feat: media_analytics — ETL PC+VPS → ClickHouse + Grafana
2 ETLs cada 5min suben snapshots (Jellyfin, *arr, Prowlarr, gnula, popelis users/mylist/events) a ClickHouse en el VPS, visualizado en Grafana (grafana.datardos.com). Ingesta PC via tunel SSH; popelis via ETL local en el VPS. Usa clickhouse_insert_rows_py_infra.
This commit is contained in:
@@ -0,0 +1,3 @@
|
|||||||
|
.env
|
||||||
|
*.log
|
||||||
|
__pycache__/
|
||||||
@@ -0,0 +1,65 @@
|
|||||||
|
---
|
||||||
|
name: media_analytics
|
||||||
|
lang: py
|
||||||
|
domain: infra
|
||||||
|
version: 0.1.0
|
||||||
|
description: "Analitica del media stack: 2 ETLs cada 5min suben snapshots (Jellyfin, *arr, Prowlarr, gnula, popelis users/mylist/events) a ClickHouse en el VPS, visualizado en Grafana. Ingesta PC via tunel SSH; popelis via ETL local en VPS."
|
||||||
|
tags: [analytics, clickhouse, grafana, etl, media, popelis, jellyfin, service]
|
||||||
|
uses_functions:
|
||||||
|
- clickhouse_insert_rows_py_infra
|
||||||
|
uses_types: []
|
||||||
|
framework: ""
|
||||||
|
entry_point: "etl_pc.py"
|
||||||
|
dir_path: "apps/media_analytics"
|
||||||
|
repo_url: ""
|
||||||
|
---
|
||||||
|
|
||||||
|
## Arquitectura
|
||||||
|
|
||||||
|
```
|
||||||
|
PC (Docker Desktop) VPS datardos (coolify net)
|
||||||
|
───────────────────── ──────────────────────────
|
||||||
|
Jellyfin :8096 ─┐ ClickHouse (interno + 127.0.0.1:8123)
|
||||||
|
Radarr/Sonarr ─┤ etl_pc.py (5min) ──ssh──► analytics.* (11 tablas snapshot)
|
||||||
|
Prowlarr ─┤ tunel SSH 18123→8123 ▲
|
||||||
|
gnula_catalog ──┘ │ etl_vps.py (5min, root)
|
||||||
|
popelis-db (Postgres) ── users/mylist/events
|
||||||
|
│
|
||||||
|
Grafana :3000 ──► grafana.datardos.com (Traefik+LE)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Componentes
|
||||||
|
|
||||||
|
| Pieza | Dónde | Qué hace |
|
||||||
|
|---|---|---|
|
||||||
|
| `etl_pc.py` | PC, systemd-user `media-analytics-etl.timer` (5min) | extrae Jellyfin (items/users/user_items/sessions), Radarr/Sonarr (history/queue), Prowlarr (indexers), gnula SQLite → push a CH via túnel SSH. Usa `clickhouse_insert_rows_py_infra`. |
|
||||||
|
| `etl_vps.py` | VPS, systemd `media-analytics-vps.timer` (5min, root) | lee popelis-db (users, mylist snapshot; events incremental por id) → CH HTTP local. Standalone (VPS sin registry). |
|
||||||
|
| `deploy/docker-compose.yml` | VPS `/opt/analytics` | ClickHouse (interno coolify + 127.0.0.1:8123) + Grafana (Traefik grafana.datardos.com). |
|
||||||
|
| `deploy/clickhouse/schema.sql` | VPS | 11 tablas: jellyfin_{items,users,user_items,sessions}, arr_{history,queue}, prowlarr_indexers, gnula_movies, popelis_{users,mylist,events}. |
|
||||||
|
| `deploy/grafana/provisioning/` | VPS | datasource ClickHouse (uid `clickhouse`) + dashboard `Media Stack Analytics` (12 paneles). |
|
||||||
|
|
||||||
|
## Secretos
|
||||||
|
- `pass datardos-vps/clickhouse` (user analytics) · `pass datardos-vps/grafana` (admin).
|
||||||
|
- PC: `~/.config/popelis/analytics.env` (chmod600; CH pass + JF/arr keys — el timer no usa GPG).
|
||||||
|
- VPS: `/opt/analytics/.env` (chmod600; CH_PASSWORD, GF_PASSWORD).
|
||||||
|
|
||||||
|
## Ejecutar manual
|
||||||
|
```bash
|
||||||
|
# PC ETL
|
||||||
|
/home/lucas/fn_registry/python/.venv/bin/python3 apps/media_analytics/etl_pc.py # real
|
||||||
|
/home/lucas/fn_registry/python/.venv/bin/python3 apps/media_analytics/etl_pc.py --dry # solo extrae
|
||||||
|
# VPS ETL
|
||||||
|
ssh datardos 'sudo python3 /opt/analytics/etl_vps.py'
|
||||||
|
# Redeploy infra VPS
|
||||||
|
rsync -az apps/media_analytics/deploy/ datardos:/opt/analytics/ && ssh datardos 'cd /opt/analytics && sudo docker compose up -d'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Visualización
|
||||||
|
https://grafana.datardos.com (admin / `pass datardos-vps/grafana`). Dashboard "Media Stack Analytics".
|
||||||
|
|
||||||
|
## Gotchas
|
||||||
|
- **Eventos "play" NO van por popelis** (la reproducción es directa a Jellyfin `/jf`): se capturan del lado Jellyfin (`jellyfin_sessions` + `jellyfin_user_items.play_count`). `popelis_events` cubre login/logout/mylist/user_created (instrumentado en popelis-api).
|
||||||
|
- ClickHouse HTTP escucha **solo 127.0.0.1 del VPS** (no público). El PC entra por túnel SSH efímero (`ssh -N -L 18123:127.0.0.1:8123`). Grafana usa el nativo :9000 por la red coolify.
|
||||||
|
- Snapshots son **append con snapshot_ts** → análisis temporal del estado. Eventos son hechos (event_ts) con dedup `ReplacingMergeTree(event_id)`.
|
||||||
|
- Int64 de ClickHouse vuelve como **string** en JSON (gotcha de `clickhouse_query`/Grafana).
|
||||||
|
- El timer del PC necesita `ssh datardos` sin passphrase (key sin passphrase o agente cargado).
|
||||||
@@ -0,0 +1,173 @@
|
|||||||
|
-- Esquema analitico media stack. Todas las tablas de snapshot llevan snapshot_ts
|
||||||
|
-- (momento de la captura del ETL, cada 5min) → permite analisis temporal del estado.
|
||||||
|
-- Las tablas de eventos llevan event_ts (instante real del evento).
|
||||||
|
-- Engine MergeTree, particion mensual, orden por (snapshot_ts, clave).
|
||||||
|
|
||||||
|
CREATE DATABASE IF NOT EXISTS analytics;
|
||||||
|
|
||||||
|
-- ============ JELLYFIN ============
|
||||||
|
-- Catalogo: peliculas/series/episodios visibles en la biblioteca.
|
||||||
|
CREATE TABLE IF NOT EXISTS analytics.jellyfin_items (
|
||||||
|
snapshot_ts DateTime,
|
||||||
|
item_id String,
|
||||||
|
type LowCardinality(String), -- Movie | Series | Episode
|
||||||
|
name String,
|
||||||
|
production_year Int32,
|
||||||
|
runtime_min Float32,
|
||||||
|
genres Array(String),
|
||||||
|
community_rating Float32,
|
||||||
|
official_rating String,
|
||||||
|
series_name String,
|
||||||
|
library String,
|
||||||
|
path String,
|
||||||
|
date_created DateTime DEFAULT toDateTime(0)
|
||||||
|
) ENGINE = MergeTree
|
||||||
|
PARTITION BY toYYYYMM(snapshot_ts)
|
||||||
|
ORDER BY (snapshot_ts, type, item_id);
|
||||||
|
|
||||||
|
-- Usuarios Jellyfin (espejos popelis).
|
||||||
|
CREATE TABLE IF NOT EXISTS analytics.jellyfin_users (
|
||||||
|
snapshot_ts DateTime,
|
||||||
|
user_id String,
|
||||||
|
name String,
|
||||||
|
last_login DateTime DEFAULT toDateTime(0),
|
||||||
|
last_activity DateTime DEFAULT toDateTime(0),
|
||||||
|
is_admin UInt8
|
||||||
|
) ENGINE = MergeTree
|
||||||
|
PARTITION BY toYYYYMM(snapshot_ts)
|
||||||
|
ORDER BY (snapshot_ts, user_id);
|
||||||
|
|
||||||
|
-- Estado de reproduccion por usuario+item (playcount, visto, ultima vez).
|
||||||
|
CREATE TABLE IF NOT EXISTS analytics.jellyfin_user_items (
|
||||||
|
snapshot_ts DateTime,
|
||||||
|
user_id String,
|
||||||
|
user_name String,
|
||||||
|
item_id String,
|
||||||
|
item_name String,
|
||||||
|
type LowCardinality(String),
|
||||||
|
played UInt8,
|
||||||
|
play_count Int32,
|
||||||
|
playback_pct Float32,
|
||||||
|
last_played DateTime DEFAULT toDateTime(0)
|
||||||
|
) ENGINE = MergeTree
|
||||||
|
PARTITION BY toYYYYMM(snapshot_ts)
|
||||||
|
ORDER BY (snapshot_ts, user_id, item_id);
|
||||||
|
|
||||||
|
-- Sesiones activas (lo que se esta viendo en el momento del snapshot).
|
||||||
|
CREATE TABLE IF NOT EXISTS analytics.jellyfin_sessions (
|
||||||
|
snapshot_ts DateTime,
|
||||||
|
user_name String,
|
||||||
|
item_name String,
|
||||||
|
item_type LowCardinality(String),
|
||||||
|
client String,
|
||||||
|
device String,
|
||||||
|
play_method String,
|
||||||
|
is_paused UInt8,
|
||||||
|
position_pct Float32
|
||||||
|
) ENGINE = MergeTree
|
||||||
|
PARTITION BY toYYYYMM(snapshot_ts)
|
||||||
|
ORDER BY (snapshot_ts, user_name);
|
||||||
|
|
||||||
|
-- ============ SCRAPERS TORRENTS (*arr) ============
|
||||||
|
-- Historial Radarr/Sonarr: grabs, imports, fallos.
|
||||||
|
CREATE TABLE IF NOT EXISTS analytics.arr_history (
|
||||||
|
snapshot_ts DateTime,
|
||||||
|
app LowCardinality(String), -- radarr | sonarr
|
||||||
|
history_id Int64,
|
||||||
|
event_type LowCardinality(String), -- grabbed | downloadFolderImported | ...
|
||||||
|
title String,
|
||||||
|
source_title String,
|
||||||
|
indexer String,
|
||||||
|
download_client String,
|
||||||
|
quality String,
|
||||||
|
languages Array(String),
|
||||||
|
event_date DateTime DEFAULT toDateTime(0)
|
||||||
|
) ENGINE = MergeTree
|
||||||
|
PARTITION BY toYYYYMM(snapshot_ts)
|
||||||
|
ORDER BY (snapshot_ts, app, history_id);
|
||||||
|
|
||||||
|
-- Cola activa de descargas.
|
||||||
|
CREATE TABLE IF NOT EXISTS analytics.arr_queue (
|
||||||
|
snapshot_ts DateTime,
|
||||||
|
app LowCardinality(String),
|
||||||
|
title String,
|
||||||
|
status String,
|
||||||
|
tracked_status String,
|
||||||
|
size_bytes Int64,
|
||||||
|
sizeleft_bytes Int64,
|
||||||
|
timeleft String,
|
||||||
|
indexer String,
|
||||||
|
download_client String
|
||||||
|
) ENGINE = MergeTree
|
||||||
|
PARTITION BY toYYYYMM(snapshot_ts)
|
||||||
|
ORDER BY (snapshot_ts, app, title);
|
||||||
|
|
||||||
|
-- Indexers Prowlarr: estado + contadores grab/query.
|
||||||
|
CREATE TABLE IF NOT EXISTS analytics.prowlarr_indexers (
|
||||||
|
snapshot_ts DateTime,
|
||||||
|
indexer_id Int32,
|
||||||
|
name String,
|
||||||
|
enable UInt8,
|
||||||
|
protocol String,
|
||||||
|
privacy String,
|
||||||
|
num_grabs Int64,
|
||||||
|
num_queries Int64,
|
||||||
|
num_grab_fail Int64,
|
||||||
|
num_query_fail Int64
|
||||||
|
) ENGINE = MergeTree
|
||||||
|
PARTITION BY toYYYYMM(snapshot_ts)
|
||||||
|
ORDER BY (snapshot_ts, indexer_id);
|
||||||
|
|
||||||
|
-- ============ GNULA SCRAPPER ============
|
||||||
|
-- Catalogo de pelis en castellano detectadas (gnula_catalog.db).
|
||||||
|
CREATE TABLE IF NOT EXISTS analytics.gnula_movies (
|
||||||
|
snapshot_ts DateTime,
|
||||||
|
href String,
|
||||||
|
title String,
|
||||||
|
year Int32,
|
||||||
|
flags String,
|
||||||
|
lang_es UInt8,
|
||||||
|
status LowCardinality(String), -- pending | downloaded | failed | have
|
||||||
|
in_library UInt8,
|
||||||
|
detected_at String,
|
||||||
|
downloaded_at String
|
||||||
|
) ENGINE = MergeTree
|
||||||
|
PARTITION BY toYYYYMM(snapshot_ts)
|
||||||
|
ORDER BY (snapshot_ts, href);
|
||||||
|
|
||||||
|
-- ============ POPELIS ============
|
||||||
|
-- Usuarios (estado).
|
||||||
|
CREATE TABLE IF NOT EXISTS analytics.popelis_users (
|
||||||
|
snapshot_ts DateTime,
|
||||||
|
user_id Int64,
|
||||||
|
username String,
|
||||||
|
jf_user_id String,
|
||||||
|
created_at DateTime DEFAULT toDateTime(0)
|
||||||
|
) ENGINE = MergeTree
|
||||||
|
PARTITION BY toYYYYMM(snapshot_ts)
|
||||||
|
ORDER BY (snapshot_ts, user_id);
|
||||||
|
|
||||||
|
-- Mi lista por usuario (estado).
|
||||||
|
CREATE TABLE IF NOT EXISTS analytics.popelis_mylist (
|
||||||
|
snapshot_ts DateTime,
|
||||||
|
user_id Int64,
|
||||||
|
item_id String,
|
||||||
|
added_at DateTime DEFAULT toDateTime(0)
|
||||||
|
) ENGINE = MergeTree
|
||||||
|
PARTITION BY toYYYYMM(snapshot_ts)
|
||||||
|
ORDER BY (snapshot_ts, user_id, item_id);
|
||||||
|
|
||||||
|
-- Eventos (logins, plays, mylist add/remove) — instrumentados en popelis-api.
|
||||||
|
-- Tabla de hechos: dedup por event_id con ReplacingMergeTree.
|
||||||
|
CREATE TABLE IF NOT EXISTS analytics.popelis_events (
|
||||||
|
event_id Int64,
|
||||||
|
event_ts DateTime,
|
||||||
|
user_id Int64,
|
||||||
|
username String,
|
||||||
|
event_type LowCardinality(String), -- login | logout | play | mylist_add | mylist_remove
|
||||||
|
item_id String,
|
||||||
|
meta String,
|
||||||
|
ingested_at DateTime DEFAULT now()
|
||||||
|
) ENGINE = ReplacingMergeTree(ingested_at)
|
||||||
|
PARTITION BY toYYYYMM(event_ts)
|
||||||
|
ORDER BY (event_id);
|
||||||
@@ -0,0 +1,60 @@
|
|||||||
|
services:
|
||||||
|
clickhouse:
|
||||||
|
image: clickhouse/clickhouse-server:24.8-alpine
|
||||||
|
container_name: clickhouse
|
||||||
|
restart: always
|
||||||
|
environment:
|
||||||
|
CLICKHOUSE_DB: analytics
|
||||||
|
CLICKHOUSE_USER: analytics
|
||||||
|
CLICKHOUSE_PASSWORD: ${CH_PASSWORD}
|
||||||
|
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: "1"
|
||||||
|
ulimits:
|
||||||
|
nofile:
|
||||||
|
soft: 262144
|
||||||
|
hard: 262144
|
||||||
|
volumes:
|
||||||
|
- clickhouse_data:/var/lib/clickhouse
|
||||||
|
- ./clickhouse/schema.sql:/docker-entrypoint-initdb.d/01_schema.sql:ro
|
||||||
|
networks:
|
||||||
|
- coolify
|
||||||
|
ports:
|
||||||
|
# HTTP solo en localhost del VPS (no publico). Ingesta del PC via tunel SSH.
|
||||||
|
# Grafana usa el nativo 9000 por la red coolify (no expuesto).
|
||||||
|
- "127.0.0.1:8123:8123"
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
limits:
|
||||||
|
memory: 2g
|
||||||
|
|
||||||
|
grafana:
|
||||||
|
image: grafana/grafana:11.2.0
|
||||||
|
container_name: grafana
|
||||||
|
restart: always
|
||||||
|
environment:
|
||||||
|
GF_SECURITY_ADMIN_USER: admin
|
||||||
|
GF_SECURITY_ADMIN_PASSWORD: ${GF_PASSWORD}
|
||||||
|
GF_INSTALL_PLUGINS: grafana-clickhouse-datasource
|
||||||
|
GF_SERVER_ROOT_URL: https://grafana.datardos.com
|
||||||
|
GF_USERS_ALLOW_SIGN_UP: "false"
|
||||||
|
CH_PASSWORD: ${CH_PASSWORD}
|
||||||
|
volumes:
|
||||||
|
- grafana_data:/var/lib/grafana
|
||||||
|
- ./grafana/provisioning:/etc/grafana/provisioning:ro
|
||||||
|
networks:
|
||||||
|
- coolify
|
||||||
|
labels:
|
||||||
|
traefik.enable: "true"
|
||||||
|
traefik.docker.network: coolify
|
||||||
|
traefik.http.routers.grafana.entrypoints: https
|
||||||
|
traefik.http.routers.grafana.rule: Host(`grafana.datardos.com`)
|
||||||
|
traefik.http.routers.grafana.tls: "true"
|
||||||
|
traefik.http.routers.grafana.tls.certresolver: letsencrypt
|
||||||
|
traefik.http.services.grafana.loadbalancer.server.port: "3000"
|
||||||
|
|
||||||
|
volumes:
|
||||||
|
clickhouse_data:
|
||||||
|
grafana_data:
|
||||||
|
|
||||||
|
networks:
|
||||||
|
coolify:
|
||||||
|
external: true
|
||||||
@@ -0,0 +1,13 @@
|
|||||||
|
apiVersion: 1
|
||||||
|
|
||||||
|
providers:
|
||||||
|
- name: media-stack
|
||||||
|
orgId: 1
|
||||||
|
folder: Media Stack
|
||||||
|
type: file
|
||||||
|
disableDeletion: false
|
||||||
|
updateIntervalSeconds: 30
|
||||||
|
allowUiUpdates: true
|
||||||
|
options:
|
||||||
|
path: /etc/grafana/provisioning/dashboards
|
||||||
|
foldersFromFilesStructure: false
|
||||||
@@ -0,0 +1,88 @@
|
|||||||
|
{
|
||||||
|
"uid": "media-stack",
|
||||||
|
"title": "Media Stack Analytics",
|
||||||
|
"tags": ["media", "popelis"],
|
||||||
|
"timezone": "browser",
|
||||||
|
"schemaVersion": 39,
|
||||||
|
"version": 1,
|
||||||
|
"refresh": "5m",
|
||||||
|
"time": { "from": "now-7d", "to": "now" },
|
||||||
|
"templating": { "list": [] },
|
||||||
|
"annotations": { "list": [] },
|
||||||
|
"panels": [
|
||||||
|
{
|
||||||
|
"id": 1, "type": "stat", "title": "Jellyfin · items (último)",
|
||||||
|
"gridPos": { "h": 4, "w": 4, "x": 0, "y": 0 },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT count() AS items FROM analytics.jellyfin_items WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.jellyfin_items)" } ]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 2, "type": "stat", "title": "Popelis · usuarios",
|
||||||
|
"gridPos": { "h": 4, "w": 4, "x": 4, "y": 0 },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT count() AS users FROM analytics.popelis_users WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.popelis_users)" } ]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 3, "type": "stat", "title": "gnula · pendientes",
|
||||||
|
"gridPos": { "h": 4, "w": 4, "x": 8, "y": 0 },
|
||||||
|
"fieldConfig": { "defaults": { "color": { "mode": "fixed", "fixedColor": "orange" } }, "overrides": [] },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT countIf(status='pending') AS pending FROM analytics.gnula_movies WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.gnula_movies)" } ]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 4, "type": "stat", "title": "gnula · descargadas",
|
||||||
|
"gridPos": { "h": 4, "w": 4, "x": 12, "y": 0 },
|
||||||
|
"fieldConfig": { "defaults": { "color": { "mode": "fixed", "fixedColor": "green" } }, "overrides": [] },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT countIf(status='downloaded') AS downloaded FROM analytics.gnula_movies WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.gnula_movies)" } ]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 5, "type": "stat", "title": "*arr · grabs (total)",
|
||||||
|
"gridPos": { "h": 4, "w": 4, "x": 16, "y": 0 },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT countIf(event_type='grabbed') AS grabs FROM analytics.arr_history WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.arr_history)" } ]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 6, "type": "stat", "title": "Jellyfin · sesiones activas (último)",
|
||||||
|
"gridPos": { "h": 4, "w": 4, "x": 20, "y": 0 },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT count() AS sesiones FROM analytics.jellyfin_sessions WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.jellyfin_sessions)" } ]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 10, "type": "timeseries", "title": "gnula · catálogo en el tiempo",
|
||||||
|
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 4 },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "timeseries", "format": 0, "rawSql": "SELECT snapshot_ts AS time, countIf(status='pending') AS pendientes, countIf(status='downloaded') AS descargadas, countIf(in_library=1) AS en_biblioteca FROM analytics.gnula_movies GROUP BY time ORDER BY time" } ]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 11, "type": "timeseries", "title": "Jellyfin · tamaño biblioteca en el tiempo",
|
||||||
|
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 4 },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "timeseries", "format": 0, "rawSql": "SELECT snapshot_ts AS time, countIf(type='Movie') AS peliculas, countIf(type='Series') AS series, countIf(type='Episode') AS episodios FROM analytics.jellyfin_items GROUP BY time ORDER BY time" } ]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 20, "type": "table", "title": "*arr · grabs recientes",
|
||||||
|
"gridPos": { "h": 9, "w": 12, "x": 0, "y": 12 },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT event_date, app, title, indexer, quality, arrayStringConcat(languages, ',') AS idiomas FROM analytics.arr_history WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.arr_history) AND event_type='grabbed' ORDER BY event_date DESC LIMIT 30" } ]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 21, "type": "table", "title": "Prowlarr · indexers",
|
||||||
|
"gridPos": { "h": 9, "w": 12, "x": 12, "y": 12 },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT name, enable, protocol, num_grabs, num_queries, num_grab_fail, num_query_fail FROM analytics.prowlarr_indexers WHERE snapshot_ts = (SELECT max(snapshot_ts) FROM analytics.prowlarr_indexers) ORDER BY num_grabs DESC" } ]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 30, "type": "table", "title": "Popelis · eventos recientes",
|
||||||
|
"gridPos": { "h": 9, "w": 12, "x": 0, "y": 21 },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "table", "format": 1, "rawSql": "SELECT event_ts, username, event_type, item_id FROM analytics.popelis_events ORDER BY event_ts DESC LIMIT 50" } ]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 31, "type": "timeseries", "title": "Popelis · eventos por tipo (por día)",
|
||||||
|
"gridPos": { "h": 9, "w": 12, "x": 12, "y": 21 },
|
||||||
|
"datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" },
|
||||||
|
"targets": [ { "refId": "A", "datasource": { "type": "grafana-clickhouse-datasource", "uid": "clickhouse" }, "editorType": "sql", "queryType": "timeseries", "format": 0, "rawSql": "SELECT toStartOfDay(event_ts) AS time, countIf(event_type='login') AS logins, countIf(event_type='mylist_add') AS mylist_add, countIf(event_type='user_created') AS altas FROM analytics.popelis_events GROUP BY time ORDER BY time" } ]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
@@ -0,0 +1,16 @@
|
|||||||
|
apiVersion: 1
|
||||||
|
|
||||||
|
datasources:
|
||||||
|
- name: ClickHouse
|
||||||
|
uid: clickhouse
|
||||||
|
type: grafana-clickhouse-datasource
|
||||||
|
access: proxy
|
||||||
|
isDefault: true
|
||||||
|
jsonData:
|
||||||
|
host: clickhouse
|
||||||
|
port: 9000
|
||||||
|
protocol: native
|
||||||
|
username: analytics
|
||||||
|
defaultDatabase: analytics
|
||||||
|
secureJsonData:
|
||||||
|
password: ${CH_PASSWORD}
|
||||||
@@ -0,0 +1,296 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""ETL PC → ClickHouse (cada 5min). Extrae las fuentes que viven en este PC:
|
||||||
|
Jellyfin (catalogo, usuarios, reproduccion, sesiones), Radarr/Sonarr (history+queue),
|
||||||
|
Prowlarr (indexers), y el catalogo gnula (SQLite). Empuja snapshots con snapshot_ts a
|
||||||
|
ClickHouse del VPS a traves de un tunel SSH (CH HTTP escucha solo en 127.0.0.1 del VPS).
|
||||||
|
|
||||||
|
Reusa funciones del registry: clickhouse_insert_rows_py_infra.
|
||||||
|
Secrets en ~/.config/popelis/analytics.env (chmod600; el timer no puede usar pass/GPG).
|
||||||
|
|
||||||
|
Uso: python etl_pc.py [--once] [--dry]
|
||||||
|
"""
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sqlite3
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
import urllib.request
|
||||||
|
import urllib.parse
|
||||||
|
import urllib.error
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
|
||||||
|
# --- registry ---
|
||||||
|
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "python", "functions"))
|
||||||
|
try:
|
||||||
|
from infra import clickhouse_insert_rows
|
||||||
|
if not callable(clickhouse_insert_rows):
|
||||||
|
raise ImportError
|
||||||
|
except ImportError:
|
||||||
|
from infra.clickhouse_insert_rows import clickhouse_insert_rows # noqa: E402
|
||||||
|
|
||||||
|
ENV_PATH = os.path.expanduser("~/.config/popelis/analytics.env")
|
||||||
|
|
||||||
|
|
||||||
|
def load_env(path):
|
||||||
|
cfg = {}
|
||||||
|
with open(path) as f:
|
||||||
|
for line in f:
|
||||||
|
line = line.strip()
|
||||||
|
if line and not line.startswith("#") and "=" in line:
|
||||||
|
k, v = line.split("=", 1)
|
||||||
|
cfg[k] = v
|
||||||
|
return cfg
|
||||||
|
|
||||||
|
|
||||||
|
CFG = load_env(ENV_PATH)
|
||||||
|
DRY = "--dry" in sys.argv
|
||||||
|
LOCAL_PORT = 18123
|
||||||
|
SNAP = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
|
||||||
|
|
||||||
|
|
||||||
|
def http_json(url, headers=None, data=None, method="GET", timeout=20):
|
||||||
|
req = urllib.request.Request(url, data=data, headers=headers or {}, method=method)
|
||||||
|
with urllib.request.urlopen(req, timeout=timeout) as r:
|
||||||
|
return json.loads(r.read())
|
||||||
|
|
||||||
|
|
||||||
|
def ticks_to_min(t):
|
||||||
|
try:
|
||||||
|
return round(float(t) / 600_000_000.0, 2) # 1 tick = 100ns
|
||||||
|
except Exception:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def fmt_dt(s):
|
||||||
|
"""Jellyfin/ISO → 'YYYY-MM-DD HH:MM:SS' o '' ."""
|
||||||
|
if not s:
|
||||||
|
return None
|
||||||
|
s = str(s).replace("Z", "").split(".")[0].replace("T", " ")
|
||||||
|
return s[:19] if len(s) >= 19 else None
|
||||||
|
|
||||||
|
|
||||||
|
# ---------- TUNEL SSH ----------
|
||||||
|
class Tunnel:
|
||||||
|
def __init__(self, host, lport, rhost="127.0.0.1", rport=8123):
|
||||||
|
self.host, self.lport, self.rhost, self.rport = host, lport, rhost, rport
|
||||||
|
self.proc = None
|
||||||
|
|
||||||
|
def __enter__(self):
|
||||||
|
self.proc = subprocess.Popen(
|
||||||
|
["ssh", "-N", "-o", "ExitOnForwardFailure=yes", "-o", "ConnectTimeout=10",
|
||||||
|
"-L", f"{self.lport}:{self.rhost}:{self.rport}", self.host],
|
||||||
|
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
|
||||||
|
# esperar puerto
|
||||||
|
for _ in range(30):
|
||||||
|
try:
|
||||||
|
urllib.request.urlopen(f"http://127.0.0.1:{self.lport}/ping", timeout=2).read()
|
||||||
|
return self
|
||||||
|
except Exception:
|
||||||
|
time.sleep(0.3)
|
||||||
|
raise RuntimeError("tunel SSH no abrio puerto local a tiempo")
|
||||||
|
|
||||||
|
def __exit__(self, *a):
|
||||||
|
if self.proc:
|
||||||
|
self.proc.terminate()
|
||||||
|
|
||||||
|
|
||||||
|
# ---------- EXTRACTORES ----------
|
||||||
|
def jf_auth():
|
||||||
|
body = json.dumps({"Username": CFG["JF_USER"], "Pw": CFG["JF_PASS"]}).encode()
|
||||||
|
hdr = {"Content-Type": "application/json",
|
||||||
|
"X-Emby-Authorization": 'MediaBrowser Client="etl", Device="pc", DeviceId="etl-pc", Version="1.0"'}
|
||||||
|
d = http_json(f'{CFG["JF_URL"]}/Users/AuthenticateByName', hdr, body, "POST")
|
||||||
|
return d["AccessToken"]
|
||||||
|
|
||||||
|
|
||||||
|
def jf_get(token, path, params=""):
|
||||||
|
url = f'{CFG["JF_URL"]}{path}'
|
||||||
|
if params:
|
||||||
|
url += ("&" if "?" in path else "?") + params
|
||||||
|
return http_json(url, {"X-Emby-Token": token}, timeout=40)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_jellyfin():
|
||||||
|
out = {"jellyfin_items": [], "jellyfin_users": [], "jellyfin_user_items": [],
|
||||||
|
"jellyfin_sessions": []}
|
||||||
|
token = jf_auth()
|
||||||
|
# items
|
||||||
|
fields = "Genres,Path,RunTimeTicks,ProductionYear,CommunityRating,OfficialRating,DateCreated,SeriesName"
|
||||||
|
d = jf_get(token, "/Items", f"Recursive=true&IncludeItemTypes=Movie,Series,Episode&Fields={fields}")
|
||||||
|
for it in d.get("Items", []):
|
||||||
|
out["jellyfin_items"].append({
|
||||||
|
"snapshot_ts": SNAP, "item_id": it.get("Id", ""), "type": it.get("Type", ""),
|
||||||
|
"name": it.get("Name", ""), "production_year": it.get("ProductionYear") or 0,
|
||||||
|
"runtime_min": ticks_to_min(it.get("RunTimeTicks", 0)),
|
||||||
|
"genres": it.get("Genres", []) or [],
|
||||||
|
"community_rating": it.get("CommunityRating") or 0.0,
|
||||||
|
"official_rating": it.get("OfficialRating", "") or "",
|
||||||
|
"series_name": it.get("SeriesName", "") or "", "library": "",
|
||||||
|
"path": it.get("Path", "") or "",
|
||||||
|
"date_created": fmt_dt(it.get("DateCreated")) or "1970-01-01 00:00:00",
|
||||||
|
})
|
||||||
|
# users
|
||||||
|
users = jf_get(token, "/Users")
|
||||||
|
for u in users:
|
||||||
|
out["jellyfin_users"].append({
|
||||||
|
"snapshot_ts": SNAP, "user_id": u.get("Id", ""), "name": u.get("Name", ""),
|
||||||
|
"last_login": fmt_dt(u.get("LastLoginDate")) or "1970-01-01 00:00:00",
|
||||||
|
"last_activity": fmt_dt(u.get("LastActivityDate")) or "1970-01-01 00:00:00",
|
||||||
|
"is_admin": 1 if u.get("Policy", {}).get("IsAdministrator") else 0,
|
||||||
|
})
|
||||||
|
# played items por usuario (Movie+Episode vistos)
|
||||||
|
try:
|
||||||
|
pi = jf_get(token, f'/Users/{u["Id"]}/Items',
|
||||||
|
"Recursive=true&IncludeItemTypes=Movie,Episode&IsPlayed=true&Fields=UserData")
|
||||||
|
for it in pi.get("Items", []):
|
||||||
|
ud = it.get("UserData", {}) or {}
|
||||||
|
out["jellyfin_user_items"].append({
|
||||||
|
"snapshot_ts": SNAP, "user_id": u.get("Id", ""), "user_name": u.get("Name", ""),
|
||||||
|
"item_id": it.get("Id", ""), "item_name": it.get("Name", ""),
|
||||||
|
"type": it.get("Type", ""), "played": 1 if ud.get("Played") else 0,
|
||||||
|
"play_count": ud.get("PlayCount", 0) or 0,
|
||||||
|
"playback_pct": round(ud.get("PlayedPercentage", 0.0) or 0.0, 2),
|
||||||
|
"last_played": fmt_dt(ud.get("LastPlayedDate")) or "1970-01-01 00:00:00",
|
||||||
|
})
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[jf] user_items {u.get('Name')}: {e}", file=sys.stderr)
|
||||||
|
# sesiones activas
|
||||||
|
try:
|
||||||
|
for s in jf_get(token, "/Sessions"):
|
||||||
|
np = s.get("NowPlayingItem")
|
||||||
|
if not np:
|
||||||
|
continue
|
||||||
|
ps = s.get("PlayState", {}) or {}
|
||||||
|
pos = ticks_to_min(ps.get("PositionTicks", 0))
|
||||||
|
dur = ticks_to_min(np.get("RunTimeTicks", 0)) or 1
|
||||||
|
out["jellyfin_sessions"].append({
|
||||||
|
"snapshot_ts": SNAP, "user_name": s.get("UserName", ""),
|
||||||
|
"item_name": np.get("Name", ""), "item_type": np.get("Type", ""),
|
||||||
|
"client": s.get("Client", ""), "device": s.get("DeviceName", ""),
|
||||||
|
"play_method": ps.get("PlayMethod", ""), "is_paused": 1 if ps.get("IsPaused") else 0,
|
||||||
|
"position_pct": round(100.0 * pos / dur, 2),
|
||||||
|
})
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[jf] sessions: {e}", file=sys.stderr)
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def arr_get(base, key, path, ver="v3"):
|
||||||
|
sep = "&" if "?" in path else "?"
|
||||||
|
return http_json(f"{base}/api/{ver}/{path}{sep}apikey={key}", timeout=30)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_arr():
|
||||||
|
out = {"arr_history": [], "arr_queue": []}
|
||||||
|
for app, base, key in [("radarr", CFG["RADARR_URL"], CFG["RADARR_KEY"]),
|
||||||
|
("sonarr", CFG["SONARR_URL"], CFG["SONARR_KEY"])]:
|
||||||
|
try:
|
||||||
|
h = arr_get(base, key, "history?page=1&pageSize=200&sortKey=date&sortDirection=descending")
|
||||||
|
for r in h.get("records", []):
|
||||||
|
out["arr_history"].append({
|
||||||
|
"snapshot_ts": SNAP, "app": app, "history_id": r.get("id", 0),
|
||||||
|
"event_type": r.get("eventType", ""),
|
||||||
|
"title": (r.get("movie", {}) or r.get("series", {}) or {}).get("title", "") or "",
|
||||||
|
"source_title": r.get("sourceTitle", "") or "",
|
||||||
|
"indexer": (r.get("data", {}) or {}).get("indexer", "") or "",
|
||||||
|
"download_client": (r.get("data", {}) or {}).get("downloadClient", "") or "",
|
||||||
|
"quality": (r.get("quality", {}) or {}).get("quality", {}).get("name", "") or "",
|
||||||
|
"languages": [l.get("name", "") for l in (r.get("languages", []) or [])],
|
||||||
|
"event_date": fmt_dt(r.get("date")) or "1970-01-01 00:00:00",
|
||||||
|
})
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[arr] {app} history: {e}", file=sys.stderr)
|
||||||
|
try:
|
||||||
|
q = arr_get(base, key, "queue?page=1&pageSize=100")
|
||||||
|
for r in q.get("records", []):
|
||||||
|
out["arr_queue"].append({
|
||||||
|
"snapshot_ts": SNAP, "app": app, "title": r.get("title", "") or "",
|
||||||
|
"status": r.get("status", "") or "",
|
||||||
|
"tracked_status": r.get("trackedDownloadState", "") or "",
|
||||||
|
"size_bytes": int(r.get("size", 0) or 0),
|
||||||
|
"sizeleft_bytes": int(r.get("sizeleft", 0) or 0),
|
||||||
|
"timeleft": r.get("timeleft", "") or "",
|
||||||
|
"indexer": r.get("indexer", "") or "",
|
||||||
|
"download_client": r.get("downloadClient", "") or "",
|
||||||
|
})
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[arr] {app} queue: {e}", file=sys.stderr)
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def extract_prowlarr():
|
||||||
|
out = {"prowlarr_indexers": []}
|
||||||
|
try:
|
||||||
|
idx = arr_get(CFG["PROWLARR_URL"], CFG["PROWLARR_KEY"], "indexer", ver="v1")
|
||||||
|
stats = {}
|
||||||
|
try:
|
||||||
|
st = arr_get(CFG["PROWLARR_URL"], CFG["PROWLARR_KEY"], "indexerstats", ver="v1")
|
||||||
|
for s in st.get("indexers", []):
|
||||||
|
stats[s.get("indexerId")] = s
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
for i in idx:
|
||||||
|
s = stats.get(i.get("id"), {})
|
||||||
|
out["prowlarr_indexers"].append({
|
||||||
|
"snapshot_ts": SNAP, "indexer_id": i.get("id", 0), "name": i.get("name", "") or "",
|
||||||
|
"enable": 1 if i.get("enable") else 0, "protocol": i.get("protocol", "") or "",
|
||||||
|
"privacy": i.get("privacy", "") or "",
|
||||||
|
"num_grabs": s.get("numberOfGrabs", 0) or 0,
|
||||||
|
"num_queries": s.get("numberOfQueries", 0) or 0,
|
||||||
|
"num_grab_fail": s.get("numberOfFailedGrabs", 0) or 0,
|
||||||
|
"num_query_fail": s.get("numberOfFailedQueries", 0) or 0,
|
||||||
|
})
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[prowlarr] {e}", file=sys.stderr)
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def extract_gnula():
|
||||||
|
out = {"gnula_movies": []}
|
||||||
|
db = CFG.get("GNULA_DB", "")
|
||||||
|
if not (db and os.path.exists(db)):
|
||||||
|
return out
|
||||||
|
c = sqlite3.connect(db)
|
||||||
|
c.row_factory = sqlite3.Row
|
||||||
|
for r in c.execute("SELECT href,title,year,flags,lang_es,status,in_library,detected_at,downloaded_at FROM movies"):
|
||||||
|
out["gnula_movies"].append({
|
||||||
|
"snapshot_ts": SNAP, "href": r["href"] or "", "title": r["title"] or "",
|
||||||
|
"year": r["year"] or 0, "flags": r["flags"] or "", "lang_es": r["lang_es"] or 0,
|
||||||
|
"status": r["status"] or "", "in_library": r["in_library"] or 0,
|
||||||
|
"detected_at": r["detected_at"] or "", "downloaded_at": r["downloaded_at"] or "",
|
||||||
|
})
|
||||||
|
c.close()
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
data = {}
|
||||||
|
for fn in (extract_jellyfin, extract_arr, extract_prowlarr, extract_gnula):
|
||||||
|
try:
|
||||||
|
data.update(fn())
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[etl] {fn.__name__} FALLO: {e}", file=sys.stderr)
|
||||||
|
counts = {t: len(rows) for t, rows in data.items()}
|
||||||
|
print(f"[etl] snapshot {SNAP} extraido: {json.dumps(counts)}")
|
||||||
|
if DRY:
|
||||||
|
print("[etl] --dry: no se inserta"); return
|
||||||
|
base = f"http://127.0.0.1:{LOCAL_PORT}"
|
||||||
|
total = 0
|
||||||
|
with Tunnel(CFG["SSH_HOST"], LOCAL_PORT):
|
||||||
|
for table, rows in data.items():
|
||||||
|
if not rows:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
n = clickhouse_insert_rows(base, f'{CFG["CH_DB"]}.{table}', rows,
|
||||||
|
user=CFG["CH_USER"], password=CFG["CH_PASSWORD"],
|
||||||
|
database=CFG["CH_DB"])
|
||||||
|
total += n
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[etl] insert {table} FALLO: {e}", file=sys.stderr)
|
||||||
|
print(json.dumps({"snapshot_ts": SNAP, "inserted": total, "tables": counts}))
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
+139
@@ -0,0 +1,139 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""ETL VPS → ClickHouse (cada 5min). Corre EN el VPS datardos (no en el PC) porque
|
||||||
|
popelis-db (Postgres) solo es alcanzable en la red coolify. Lee popelis-db via
|
||||||
|
`docker exec popelis-db psql` y empuja a ClickHouse por su HTTP local (127.0.0.1:8123).
|
||||||
|
|
||||||
|
Standalone a proposito: el VPS no tiene el registry fn_registry checkouteado, asi que
|
||||||
|
no importa clickhouse_insert_rows_py_infra — replica el POST JSONEachRow minimal.
|
||||||
|
|
||||||
|
- users, mylist: snapshot completo cada run (snapshot_ts).
|
||||||
|
- events: incremental por id (> max(event_id) ya en CH; ReplacingMergeTree dedup).
|
||||||
|
|
||||||
|
Lee creds de /opt/analytics/.env (CH_PASSWORD). Pensado para systemd timer en el VPS.
|
||||||
|
Uso: sudo python3 etl_vps.py
|
||||||
|
"""
|
||||||
|
import json
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import urllib.request
|
||||||
|
import urllib.parse
|
||||||
|
import urllib.error
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
|
||||||
|
ENV = "/opt/analytics/.env"
|
||||||
|
CH_URL = "http://127.0.0.1:8123"
|
||||||
|
CH_USER = "analytics"
|
||||||
|
CH_DB = "analytics"
|
||||||
|
PG_CONTAINER = "popelis-db"
|
||||||
|
PG_USER = "popelis"
|
||||||
|
PG_DB = "popelis"
|
||||||
|
SNAP = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
|
||||||
|
|
||||||
|
|
||||||
|
def ch_password():
|
||||||
|
with open(ENV) as f:
|
||||||
|
for line in f:
|
||||||
|
if line.startswith("CH_PASSWORD="):
|
||||||
|
return line.strip().split("=", 1)[1]
|
||||||
|
raise RuntimeError("CH_PASSWORD no en " + ENV)
|
||||||
|
|
||||||
|
|
||||||
|
CH_PASS = ch_password()
|
||||||
|
|
||||||
|
|
||||||
|
def pg_json(sql):
|
||||||
|
"""Ejecuta SQL en popelis-db y devuelve list[dict] (via json_agg)."""
|
||||||
|
wrapped = f"SELECT COALESCE(json_agg(t), '[]') FROM ({sql}) t"
|
||||||
|
out = subprocess.check_output(
|
||||||
|
["docker", "exec", PG_CONTAINER, "psql", "-U", PG_USER, "-d", PG_DB,
|
||||||
|
"-tAc", wrapped], text=True)
|
||||||
|
return json.loads(out.strip() or "[]")
|
||||||
|
|
||||||
|
|
||||||
|
def ch_query(sql):
|
||||||
|
url = f"{CH_URL}/?{urllib.parse.urlencode({'database': CH_DB, 'default_format': 'JSONEachRow'})}"
|
||||||
|
req = urllib.request.Request(url, data=sql.encode(),
|
||||||
|
headers={"X-ClickHouse-User": CH_USER, "X-ClickHouse-Key": CH_PASS})
|
||||||
|
with urllib.request.urlopen(req, timeout=30) as r:
|
||||||
|
body = r.read().decode()
|
||||||
|
return [json.loads(l) for l in body.splitlines() if l.strip()]
|
||||||
|
|
||||||
|
|
||||||
|
def ch_insert(table, rows):
|
||||||
|
if not rows:
|
||||||
|
return 0
|
||||||
|
body = "\n".join(json.dumps(r, separators=(",", ":")) for r in rows).encode()
|
||||||
|
q = f"INSERT INTO {CH_DB}.{table} FORMAT JSONEachRow"
|
||||||
|
url = f"{CH_URL}/?{urllib.parse.urlencode({'database': CH_DB, 'query': q})}"
|
||||||
|
req = urllib.request.Request(url, data=body,
|
||||||
|
headers={"X-ClickHouse-User": CH_USER, "X-ClickHouse-Key": CH_PASS,
|
||||||
|
"Content-Type": "text/plain"})
|
||||||
|
try:
|
||||||
|
urllib.request.urlopen(req, timeout=30).read()
|
||||||
|
except urllib.error.HTTPError as e:
|
||||||
|
raise ValueError(f"CH {e.code}: {e.read()[:300]}")
|
||||||
|
return len(rows)
|
||||||
|
|
||||||
|
|
||||||
|
def fmt_ts(s):
|
||||||
|
if not s:
|
||||||
|
return "1970-01-01 00:00:00"
|
||||||
|
return str(s).replace("T", " ").split(".")[0].split("+")[0][:19]
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
total = 0
|
||||||
|
# users (snapshot)
|
||||||
|
users = pg_json("SELECT id AS user_id, username, jf_user_id, "
|
||||||
|
"to_char(COALESCE(created_at, now()),'YYYY-MM-DD HH24:MI:SS') AS created_at "
|
||||||
|
"FROM users") if has_col("users", "created_at") else \
|
||||||
|
pg_json("SELECT id AS user_id, username, jf_user_id FROM users")
|
||||||
|
for u in users:
|
||||||
|
u["snapshot_ts"] = SNAP
|
||||||
|
u.setdefault("created_at", "1970-01-01 00:00:00")
|
||||||
|
u["created_at"] = fmt_ts(u["created_at"])
|
||||||
|
total += ch_insert("popelis_users", users)
|
||||||
|
|
||||||
|
# mylist (snapshot)
|
||||||
|
ml = pg_json("SELECT user_id, item_id, "
|
||||||
|
"to_char(COALESCE(added_at, now()),'YYYY-MM-DD HH24:MI:SS') AS added_at FROM mylist") \
|
||||||
|
if has_col("mylist", "added_at") else \
|
||||||
|
pg_json("SELECT user_id, item_id FROM mylist")
|
||||||
|
for m in ml:
|
||||||
|
m["snapshot_ts"] = SNAP
|
||||||
|
m.setdefault("added_at", "1970-01-01 00:00:00")
|
||||||
|
m["added_at"] = fmt_ts(m["added_at"])
|
||||||
|
total += ch_insert("popelis_mylist", ml)
|
||||||
|
|
||||||
|
# events (incremental por id)
|
||||||
|
last = 0
|
||||||
|
try:
|
||||||
|
r = ch_query("SELECT max(event_id) AS m FROM popelis_events")
|
||||||
|
last = int(r[0]["m"]) if r and r[0].get("m") not in (None, "") else 0
|
||||||
|
except Exception:
|
||||||
|
last = 0
|
||||||
|
ev = pg_json(f"SELECT id AS event_id, "
|
||||||
|
f"to_char(ts,'YYYY-MM-DD HH24:MI:SS') AS event_ts, "
|
||||||
|
f"COALESCE(user_id,0) AS user_id, username, event_type, item_id, meta "
|
||||||
|
f"FROM events WHERE id > {last} ORDER BY id")
|
||||||
|
for e in ev:
|
||||||
|
e["event_ts"] = fmt_ts(e["event_ts"])
|
||||||
|
total += ch_insert("popelis_events", ev)
|
||||||
|
|
||||||
|
print(json.dumps({"snapshot_ts": SNAP, "users": len(users), "mylist": len(ml),
|
||||||
|
"events_new": len(ev), "inserted": total}))
|
||||||
|
|
||||||
|
|
||||||
|
def has_col(table, col):
|
||||||
|
try:
|
||||||
|
out = subprocess.check_output(
|
||||||
|
["docker", "exec", PG_CONTAINER, "psql", "-U", PG_USER, "-d", PG_DB, "-tAc",
|
||||||
|
f"SELECT 1 FROM information_schema.columns WHERE table_name='{table}' AND column_name='{col}'"],
|
||||||
|
text=True).strip()
|
||||||
|
return out == "1"
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
Reference in New Issue
Block a user