feat: gnula_grabber 2-part pipeline (crawl ES + CDP stream download)

This commit is contained in:
agent
2026-05-30 13:54:57 +02:00
commit 79c2c8a804
6 changed files with 497 additions and 0 deletions
+2
View File
@@ -0,0 +1,2 @@
__pycache__/
*.pyc
+82
View File
@@ -0,0 +1,82 @@
# Recipe: descarga automática castellano gnula → Jellyfin
Flujo completo validado 2026-05-30. Para promover a pipeline del registry
(`download_gnula_movie_py_pipelines` o similar). Humano solo hace captcha si sale.
## 0. Infra previa (una vez)
- Bridge NordVPN: `start_nordvpn_socks_bridge_bash_infra --port 8889`
(creds env NORDVPN_SOCKS_USER/PASS de `pass datardos-vps/nordvpn`). gost SOCKS5→HTTP local.
- Jellyfin realtime monitor ON en libs `media/movies`,`media/tv` (ya).
- systemd-user timer `popelis-import.timer` (3min) → `popelis_import_media_drop` (manual→media+metadata).
- Extensiones unpacked en Windows: `C:\Users\lucas\hls-dl-ext` (HLS+AES decrypt), `C:\Users\lucas\ubol` (uBlock Origin Lite MV3, mata ads/popunders).
## 1. Lanzar Chrome (Windows, visible) por NordVPN + CDP + extensiones
```
chrome.exe --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0 \
--remote-allow-origins='*' \
--user-data-dir='C:\Users\lucas\AppData\Local\fn-chrome-cdp-profile' \
--proxy-server=http://127.0.0.1:8889 \
--disable-extensions-except='C:\Users\lucas\hls-dl-ext,C:\Users\lucas\ubol' \
--load-extension='C:\Users\lucas\hls-dl-ext,C:\Users\lucas\ubol' \
--no-first-run --no-default-browser-check --new-window "https://www.gnularetro.cc/release/2025/"
```
NOTA Chrome 148: `--load-extension` SOLO no carga las extensiones; hay que añadir
`--disable-extensions-except` con los mismos paths. Verificar con browser-ws Target.getTargets
(SW chrome-extension://) o chrome://extensions.
gnula está ISP-bloqueado (Allot) + Cloudflare → NordVPN salta ISP, navegador real pasa CF.
## 2. Listado: detectar pelis con bandera España + dedup
CDP eval en la página release: por cada `a[href*="/ver-pelicula/"]`, mirar imgs `/flags/`.
- `es.png` = **España (Español/castellano)** — la que queremos. (En el LISTADO el alt-text está
invertido: es.png alt="Latino". IGNORAR alt, usar el nombre de imagen es.png. Confirmado en la
página de peli: el grupo es.png se etiqueta literalmente "Español".)
- `mx.png` = México/Latino. `jp.png` = Subtitulado/VOSE.
- Dedup: cruzar título vs `GET /api/v3/movie` de Radarr (key 63fb51c8...) — saltar los ya presentes.
## 3. Página de la peli: elegir server Español
CDP: navegar al href. Las opciones de player son `li[data-nume]` (DooPlay):
- Grupo "Español" (es.png) = dnume bajos; "Subtitulado" (jp.png) = dnume altos.
- Servers: streamplay, **luluvid(=luluvdo)**, bigwarp, filemoon, dsvplay, powvideo, streamtape, vidmoly, voe.
- PREFERIR luluvid/luluvdo (sabemos descifrar su HLS AES-128). Clic: `li[data-nume="2"].click()` (luluvid Español).
- Carga iframe `lulustream.com/e/<id>` → resuelve a `luluvdo.com/e/<id>`.
## 4. Play (CDP) — método que FUNCIONA (sin depender de uBlock)
luluvdo pone un **overlay ad click-catcher** `div` con `z-index>=1000` encima del
player: el `.click()` JS no es gesto válido y un clic normal abre popunder. Solución
validada (en el target del iframe luluvdo):
1. **Quitar overlays**: `document.querySelectorAll('div,iframe')` con `zIndex>=1000``.remove()` (quita ~4).
2. **Trusted click** en `.jw-icon-display`/video via `Input.dispatchMouseEvent` (coords del rect).
3. `jwplayer().play(true)`.
→ video.paused=false + master.m3u8 pedido. Captcha real (si sale) = humano.
- Cerrar popunders: `GET /json/list` → cerrar targets que no sean gnula/luluvdo (`/json/close/<id>`).
- uBlock: `--load-extension` SOLO falló en Chrome 148 (ver gotcha); NO es necesario para play
(el overlay se quita in-page). Útil solo para reducir popunders generales.
## 5. Descarga streaming + descifrado (NO blob → no peta Chrome)
`temp/grab_stream.py <basename>`:
- Conecta CDP al iframe player.
- Lee master.m3u8 (de performance entries) → variante → media playlist.
- AES-128: `#EXT-X-KEY` (key url + IV; sin IV → IV=media-sequence en 16B BE).
- Por segmento: in-page `fetch(seg)→arrayBuffer→base64` (sesión browser=pasa 522 CF) → CDP →
LOCAL: base64decode + `AES.new(key,CBC,iv).decrypt` + unpad PKCS7 + append a `/tmp/grab/<n>.ts`.
(1 segmento en RAM a la vez → aguanta pelis grandes.)
- Remux `ffmpeg -c copy` .ts→.mkv directo a `F:\POPELIS\manual\movies\<basename>.mkv`. Borra .ts.
## 6. Import + Jellyfin (automático)
- systemd timer (3min) corre `popelis_import_media_drop` → identifica (Radarr/TMDb) + mueve
manual/movies → media/movies con ficha → Jellyfin realtime monitor → aparece.
- (o lanzar import on-demand: `RADARR_KEY=.. SONARR_KEY=.. python popelis_import_media_drop.py --apply`)
## Gotchas
- blob-en-memoria petaba Chrome con peli 3.4GB → usar grab_stream.py (streaming).
- Token master.m3u8 caduca ~8h + ligado a IP de salida → descargar por el MISMO proxy/sesión.
- 522 CF sólo afecta clientes externos (curl/ffmpeg/yt-dlp); la sesión viva del navegador = 200.
- Multi-audio urlset: el variant elegido trae 1 audio (el del grupo Español = castellano).
- chrome.downloads / a.click() blob cae en Downloads, NO respeta setDownloadBehavior → por eso
grab_stream.py escribe directo a disco vía CDP+local, evitando el download del navegador.
- **`--load-extension` falla en Chrome 148**: las extensiones unpacked (hls-dl-ext, ubol) NO
cargaron (Target.getTargets no lista sus SW + overlay ad seguía). Fix a probar:
añadir `--disable-extensions-except='<mismos paths>'` junto a `--load-extension`. Pero NO
es necesario para el flujo: el overlay se neutraliza in-page (paso 4). uBOL Lite (MV3 DNR)
además NO mata bien el click-catcher inyectado por JS (limitado vs uBO full, que es MV2 y
Chrome 148 ya no admite). Conclusión: depender del paso-4 in-page, no de uBlock.
+62
View File
@@ -0,0 +1,62 @@
---
name: gnula_grabber
lang: py
domain: infra
version: 0.1.0
description: "Pipeline 2-partes: detecta pelis en castellano en gnula (crawler→catálogo SQLite) y las descarga via Chrome+CDP+NordVPN (HLS streaming+descifrado AES) → manual/movies → Radarr/Jellyfin. Captcha=humano."
tags: [gnula, hls, scraping, cdp, nordvpn, media, popelis, navegator]
uses_functions:
- extract_hls_from_cdp_tab_py_pipelines
- start_nordvpn_socks_bridge_bash_infra
- popelis_import_media_drop_py_infra
uses_types: []
framework: ""
entry_point: "crawl.py"
dir_path: "apps/gnula_grabber"
repo_url: ""
---
## Propósito
Sistema 2-partes para poblar la biblioteca Jellyfin con pelis en **castellano** desde gnula
(gnularetro.cc), saltando el bloqueo del ISP (DPI Allot, vía NordVPN) y Cloudflare (navegador real).
## Parte 1 — Detector (`crawl.py`)
Crawler vía **FlareSolverr** (Chrome headless NordVPN+CF). Recorre listados/categorías,
parsea `<article class="item movies">`, filtra los que tienen bandera **es.png** (= "Español"
dentro de la página; el alt-text del listado está invertido — usar el nombre de imagen),
dedup vs catálogo + Radarr, y guarda en SQLite `~/.config/popelis/gnula_catalog.db`
(tabla `movies`: href, title, year, flags, lang_es, status, in_library).
```bash
python3 apps/gnula_grabber/crawl.py "peliculas/estrenos" 2
python3 apps/gnula_grabber/crawl.py "peliculas/accion" 2 # paginación rota -> recorrer categorías
```
## Parte 2 — Downloader (`download.py`)
Consume el catálogo (pelis `status=pending`). Por cada una: navega a su página en el Chrome
CDP, clica el server **Español** (luluvid/luluvdo, `li[data-nume]` del grupo es.png), quita el
overlay de ad (`div z-index>=1000`) + `jwplayer().play()` + trusted click, espera `master.m3u8`
(captcha=humano), y delega a `grab_stream.py`. Marca `downloaded`.
```bash
python3 apps/gnula_grabber/download.py 1 # baja 1 peli pending
```
## Núcleo — `grab_stream.py`
Descarga HLS **streaming a disco** (no blob → no peta Chrome): in-page fetchea cada segmento
crudo (sesión browser = pasa el 522 CF), lo manda por CDP, y LOCAL descifra AES-128-CBC
(key+IV de `#EXT-X-KEY`; IV=media-sequence) + append a `.ts` + remux `ffmpeg -c copy``.mkv`
directo a `F:\POPELIS\manual\movies`. El timer systemd `popelis-import.timer` lo importa
(metadata Radarr) → `media` → Jellyfin (realtime monitor).
```bash
python3 apps/gnula_grabber/grab_stream.py "Titulo (2025)" # con el player luluvdo reproduciendo
```
## Prerequisitos
- Bridge NordVPN (`start_nordvpn_socks_bridge --port 8889`).
- gluetun+FlareSolverr arriba (para el crawler).
- Chrome lanzado por CDP+NordVPN (ver README.md, lanzamiento canónico).
- venv con `pycryptodome` + `websocket-client` (vía `uv run --with`).
## Gotchas
- Ver `README.md` para el flujo completo + gotchas (overlay ad, --load-extension Chrome 148,
token efímero, 522 solo-externos, paginación gnula rota → categorías).
+100
View File
@@ -0,0 +1,100 @@
#!/usr/bin/env python3
"""PARTE 1 — Crawler gnula: detecta pelis en ESPAÑOL (bandera es.png) y las cataloga.
Usa FlareSolverr (Chrome NordVPN+CF, headless) para fetchear las páginas listado
(gnula está ISP-bloqueado + Cloudflare). Parsea los <article class="item movies">,
filtra los que tienen flag es.png (Español/castellano), dedup vs catálogo + Radarr,
y guarda en SQLite. NO descarga nada (eso es Parte 2: grab_stream.py).
Uso: python gnula_crawl.py [base_path] [max_pages]
base_path: ej 'peliculas/estrenos' (default), 'release/2025', 'peliculas/accion'...
"""
import json, re, sqlite3, sys, time, urllib.request
FLARE = "http://localhost:8191/v1"
SITE = "https://www.gnularetro.cc"
DB = "/home/lucas/.config/popelis/gnula_catalog.db"
RADARR = ("http://localhost:7878", "63fb51c8c95746e2a327740baac02f5e")
def flare_get(url, timeout=150, retries=2):
for _a in range(retries+1):
try:
return _flare_get(url, timeout)
except Exception as e:
if _a==retries: raise
time.sleep(5)
def _flare_get(url, timeout=150):
body = json.dumps({"cmd": "request.get", "url": url, "maxTimeout": 60000}).encode()
req = urllib.request.Request(FLARE, data=body, headers={"Content-Type": "application/json"})
resp = json.load(urllib.request.urlopen(req, timeout=timeout))
sol = resp.get("solution", {})
return sol.get("status"), sol.get("response", "")
def parse_cards(html):
out = []
for blk in re.findall(r'<article[^>]*class="[^"]*item movies[^"]*".*?</article>', html, re.S):
href = re.search(r'href="([^"]*ver-pelicula[^"]*)"', blk)
title = re.search(r'<h3[^>]*>.*?<a[^>]*>([^<]+)', blk, re.S) or re.search(r'title="([^"]+)"', blk)
flags = re.findall(r'/flags/(\w+\.png)', blk)
yr = re.search(r'<span>(\d{4})</span>', blk) or re.search(r'\b(20\d{2})\b', blk)
if href and title:
out.append({
"href": href.group(1),
"title": title.group(1).strip(),
"year": int(yr.group(1)) if yr else None,
"flags": sorted(set(flags)),
"lang_es": 1 if any("es.png" == f for f in flags) else 0,
})
return out
def db_init():
import os; os.makedirs("/home/lucas/.config/popelis", exist_ok=True)
c = sqlite3.connect(DB)
c.execute("""CREATE TABLE IF NOT EXISTS movies(
href TEXT PRIMARY KEY, title TEXT, year INTEGER, flags TEXT,
lang_es INTEGER, status TEXT DEFAULT 'pending', in_library INTEGER DEFAULT 0,
detected_at TEXT, downloaded_at TEXT)""")
c.commit(); return c
def radarr_titles():
try:
req = urllib.request.Request(f"{RADARR[0]}/api/v3/movie", headers={"X-Api-Key": RADARR[1]})
return {m["title"].lower() for m in json.load(urllib.request.urlopen(req, timeout=20))}
except Exception:
return set()
def main():
base = sys.argv[1] if len(sys.argv) > 1 else "peliculas/estrenos"
max_pages = int(sys.argv[2]) if len(sys.argv) > 2 else 8
c = db_init(); have = radarr_titles()
now = time.strftime("%Y-%m-%dT%H:%M:%S")
total_new = 0; total_es = 0
for p in range(1, max_pages + 1):
url = f"{SITE}/{base}/" if p == 1 else f"{SITE}/{base}/page/{p}/"
st, html = flare_get(url)
if st != 200:
print(f"[crawl] page {p}: HTTP {st} -> stop"); break
cards = parse_cards(html)
if not cards:
print(f"[crawl] page {p}: 0 cards -> stop"); break
es = [x for x in cards if x["lang_es"]]
for x in es:
inlib = 1 if any(t in x["title"].lower() or x["title"].lower() in t for t in have) else 0
c.execute("""INSERT INTO movies(href,title,year,flags,lang_es,status,in_library,detected_at)
VALUES(?,?,?,?,?,?,?,?)
ON CONFLICT(href) DO UPDATE SET flags=excluded.flags, in_library=excluded.in_library""",
(x["href"], x["title"], x["year"], ",".join(x["flags"]), 1,
"have" if inlib else "pending", inlib, now))
c.commit()
total_es += len(es); total_new += len(cards)
print(f"[crawl] page {p}: {len(cards)} pelis, {len(es)} en español")
time.sleep(1)
# resumen
cur = c.execute("SELECT COUNT(*),SUM(in_library),SUM(status='pending') FROM movies WHERE lang_es=1")
tot, inlib, pend = cur.fetchone()
print(json.dumps({"crawled_pages_upto": max_pages, "es_seen": total_es,
"catalog_total_es": tot, "in_library": inlib or 0,
"pending_download": pend or 0, "db": DB}))
if __name__ == "__main__":
main()
+119
View File
@@ -0,0 +1,119 @@
#!/usr/bin/env python3
"""PARTE 2 — Downloader: consume el catálogo (gnula_crawl.py) y descarga las pelis
español pendientes via la sesión Chrome+CDP+NordVPN.
Por cada peli pending: navega a su página, clica el server Español (luluvid/luluvdo),
quita el overlay de ad + play (jwplayer), espera master.m3u8, y delega a grab_stream
(streaming + descifrado AES local) → manual/movies. Marca downloaded en el catálogo.
El captcha (si sale) lo resuelve el humano: el script espera a que aparezca master.
Uso: python gnula_download.py [limit] (default 1 peli; usar N para varias)
Requiere: Chrome con --remote-debugging-port=9222 (NordVPN proxy) abierto.
"""
import json, re, sqlite3, subprocess, sys, time, urllib.request
import websocket
PORT = 9222
DB = "/home/lucas/.config/popelis/gnula_catalog.db"
GRAB = "/home/lucas/fn_registry/apps/gnula_grabber/grab_stream.py"
def tabs(): return json.load(urllib.request.urlopen(f"http://127.0.0.1:{PORT}/json/list", timeout=8))
def conn(t):
w = websocket.create_connection(t["webSocketDebuggerUrl"], timeout=20, header=["Origin: http://localhost"]); w.settimeout(3); return w
def ev(w, expr, to=15):
w.send(json.dumps({"id": 1, "method": "Runtime.evaluate", "params": {"expression": expr, "returnByValue": True}}))
s = time.time()
while time.time() - s < to:
try: m = json.loads(w.recv())
except Exception: continue
if m.get("id") == 1: return m.get("result", {}).get("result", {}).get("value")
def cmd(w, method, params, to=8):
w.send(json.dumps({"id": 2, "method": method, "params": params}))
s = time.time()
while time.time() - s < to:
try: m = json.loads(w.recv())
except Exception: continue
if m.get("id") == 2: return m.get("result", {})
def page_target(sub):
return next((t for t in tabs() if sub in t.get("url", "") and t.get("type") == "page"), None)
def player_target():
return next((t for t in tabs() if any(h in t.get("url", "") for h in ["luluvdo", "lulustream"])), None)
def safe_name(title, year):
base = re.split(r"\s*\|", title)[0].strip()
base = re.sub(r"\(20\d{2}\)", "", base).strip()
return re.sub(r"[^\w .-]", "", f"{base} ({year})" if year else base)[:90]
def download_one(href, title, year):
name = safe_name(title, year)
print(f"[dl] {name} -> {href}", flush=True)
# 1. navegar a la página de la peli
g = next((t for t in tabs() if t.get("type") == "page"), None)
w = conn(g); ev(w, f"location.href={json.dumps(href)}"); w.close(); time.sleep(7)
# 2. clic server Español (dnume 2 = luluvid). Prefiere luluvid; si no, primer dnume del grupo es.png
pg = page_target("/ver-pelicula/")
if not pg: print("[dl] no movie page"); return False
w = conn(pg)
clicked = ev(w, r'''(()=>{
const lis=[...document.querySelectorAll('li[data-nume]')];
// grupo Español: li cuyo flag es es.png; preferir luluvid
const esLis=lis.filter(l=>[...l.querySelectorAll('img')].some(i=>/es\.png/.test(i.src||'')));
const pick=esLis.find(l=>/luluv/i.test(l.textContent||''))||esLis[0]||lis[0];
if(!pick)return"no-options"; pick.click(); return"clicked:"+(pick.textContent||'').trim().slice(0,15);
})()'''); w.close()
print("[dl] server:", clicked); time.sleep(6)
# 3. play: quitar overlays + jwplayer.play + trusted click
pl = player_target()
for _ in range(4):
if pl: break
time.sleep(3); pl = player_target()
if not pl: print("[dl] no player iframe (captcha?)"); return False
w = conn(pl)
ev(w, r'''(()=>{let n=0;document.querySelectorAll('div,iframe').forEach(e=>{if(parseInt(getComputedStyle(e).zIndex||0)>=1000){e.remove();n++;}});return n;})()''')
rect = ev(w, r'''(()=>{const b=document.querySelector('.jw-icon-display,.jw-display-icon-container,video');if(!b)return null;const r=b.getBoundingClientRect();return Math.round(r.left+r.width/2)+","+Math.round(r.top+r.height/2);})()''')
if rect and "," in rect:
cx, cy = [int(x) for x in rect.split(",")]
for _ in range(2):
cmd(w, "Input.dispatchMouseEvent", {"type": "mousePressed", "x": cx, "y": cy, "button": "left", "clickCount": 1})
cmd(w, "Input.dispatchMouseEvent", {"type": "mouseReleased", "x": cx, "y": cy, "button": "left", "clickCount": 1})
time.sleep(2)
ev(w, r'''(()=>{try{jwplayer().play(true);}catch(e){}})()''')
# 4. esperar master (humano resuelve captcha si hace falta) hasta 120s
got = False
for _ in range(40):
ev(w, r'''(()=>{document.querySelectorAll('div,iframe').forEach(e=>{if(parseInt(getComputedStyle(e).zIndex||0)>=1000)e.remove();});})()''')
if ev(w, r'''performance.getEntriesByType('resource').some(e=>/master\.m3u8/.test(e.name))'''):
got = True; break
time.sleep(3)
w.close()
if not got:
print("[dl] master no apareció (captcha/no play). Skip."); return False
# 5. grab streaming
r = subprocess.run(["uv", "run", "--with", "websocket-client", "--with", "pycryptodome",
"python", GRAB, name], cwd="/home/lucas/fn_registry/apps/gnula_grabber",
capture_output=True, text=True, timeout=3600)
print(r.stdout[-400:]);
ok = '"status": "ok"' in r.stdout
return ok
def main():
limit = int(sys.argv[1]) if len(sys.argv) > 1 else 1
c = sqlite3.connect(DB)
rows = c.execute("SELECT href,title,year FROM movies WHERE lang_es=1 AND status='pending' ORDER BY detected_at LIMIT ?", (limit,)).fetchall()
print(f"[dl] {len(rows)} pelis pending a descargar (limit {limit})")
done = 0
for href, title, year in rows:
try:
ok = download_one(href, title, year)
except Exception as e:
print("[dl] error:", e); ok = False
if ok:
c.execute("UPDATE movies SET status='downloaded', downloaded_at=? WHERE href=?",
(time.strftime("%Y-%m-%dT%H:%M:%S"), href)); c.commit(); done += 1
else:
c.execute("UPDATE movies SET status='failed' WHERE href=?", (href,)); c.commit()
print(json.dumps({"attempted": len(rows), "downloaded": done}))
if __name__ == "__main__":
main()
+132
View File
@@ -0,0 +1,132 @@
#!/usr/bin/env python3
"""Descarga HLS en streaming a disco (descifra AES-128 LOCAL) via CDP.
No usa Blob gigante en el navegador (eso petaba Chrome). En su lugar:
- in-page fetchea cada segmento CRUDO (sesion browser = pasa 522 CF) y lo
devuelve base64 por CDP.
- LOCAL: base64-decode + AES-128-CBC decrypt (key+IV) + append al .ts en disco.
Asi solo 1 segmento vive en memoria a la vez. Aguanta pelis de cualquier tamaño.
Uso: python grab_stream.py <out_basename> [url_substring]
"""
import base64, json, os, subprocess, sys, time, urllib.request
import websocket # uv --with websocket-client
from Crypto.Cipher import AES # uv --with pycryptodome
from Crypto.Util.Padding import unpad
PORT = 9222
DEST = "/mnt/f/POPELIS/manual/movies"
PLAYER_HOSTS = ["luluvdo", "luluvid", "filemoon", "streamwish", "dood", "vidhide",
"bigwarp", "voe", "streamtape", "vidmoly", "cdn-tnmr", "/e/", "/embed"]
def targets():
return json.load(urllib.request.urlopen(f"http://127.0.0.1:{PORT}/json/list", timeout=8))
class CDP:
def __init__(self, ws_url):
self.ws = websocket.create_connection(ws_url, timeout=30, header=["Origin: http://localhost"])
self.ws.settimeout(60); self.id = 0
def ev(self, expr, await_p=False, to=120):
self.id += 1; mid = self.id
self.ws.send(json.dumps({"id": mid, "method": "Runtime.evaluate",
"params": {"expression": expr, "returnByValue": True, "awaitPromise": await_p}}))
s = time.time()
while time.time() - s < to:
try: m = json.loads(self.ws.recv())
except Exception: continue
if m.get("id") == mid:
r = m.get("result", {})
if "exceptionDetails" in r:
raise RuntimeError(json.dumps(r["exceptionDetails"])[:300])
return r.get("result", {}).get("value")
raise TimeoutError("CDP eval timeout")
def close(self):
try: self.ws.close()
except Exception: pass
def find_player():
for t in targets():
if t.get("type") in ("page", "iframe") and t.get("webSocketDebuggerUrl") \
and any(h in t.get("url", "") for h in PLAYER_HOSTS):
return t
return None
# JS: devuelve metadata del playlist (key url, segmentos, media-sequence, iv)
JS_META = r'''(async()=>{
const abs=(b,u)=>new URL(u,b).href;
const gt=async u=>{const r=await fetch(u,{headers:{Referer:location.origin+"/"}});if(!r.ok)throw new Error("HTTP "+r.status);return r.text();};
const masters=performance.getEntriesByType('resource').map(e=>e.name).filter(n=>/master\.m3u8/.test(n));
if(!masters.length)return JSON.stringify({err:"no master (¿diste play?)"});
let m=masters[0],t=await gt(m),pl=m;
if(/#EXT-X-STREAM-INF/.test(t)){const L=t.split("\n");for(let i=0;i<L.length;i++)if(L[i].startsWith("#EXT-X-STREAM-INF")){pl=abs(m,L[i+1].trim());break;}}
const media=await gt(pl);
const km=media.match(/#EXT-X-KEY:METHOD=AES-128,URI="([^"]+)"(?:,IV=0x([0-9A-Fa-f]+))?/);
let seq=1;const ms=media.match(/#EXT-X-MEDIA-SEQUENCE:(\d+)/);if(ms)seq=parseInt(ms[1]);
const segs=media.split("\n").map(s=>s.trim()).filter(s=>s&&!s.startsWith("#")).map(s=>abs(pl,s));
return JSON.stringify({keyUrl:km?abs(pl,km[1]):null, ivHex:km&&km[2]?km[2]:null, seq, segs, master:m});
})()'''
# JS: fetch de un recurso -> base64 (chunked para binarios grandes)
JS_FETCH_B64 = '''(async()=>{
const r=await fetch(%s,{headers:{Referer:location.origin+"/"}});
if(!r.ok)return "ERR"+r.status;
const b=new Uint8Array(await r.arrayBuffer());let s="";const C=0x8000;
for(let i=0;i<b.length;i+=C)s+=String.fromCharCode.apply(null,b.subarray(i,i+C));
return btoa(s);
})()'''
def main():
name = sys.argv[1] if len(sys.argv) > 1 else "video"
pl = find_player()
if not pl:
print(json.dumps({"status": "error", "error": "no player iframe (¿diste play?)"})); return
print(f"[grab] player: {pl['url'][:70]}", flush=True)
cdp = CDP(pl["webSocketDebuggerUrl"])
meta = json.loads(cdp.ev(JS_META, await_p=True, to=60))
if meta.get("err"):
print(json.dumps({"status": "error", "error": meta["err"]})); cdp.close(); return
segs = meta["segs"]; seq = meta["seq"]
print(f"[grab] {len(segs)} segmentos, key={'si' if meta['keyUrl'] else 'no'}", flush=True)
key = None
if meta["keyUrl"]:
kb = cdp.ev(JS_FETCH_B64 % json.dumps(meta["keyUrl"]), await_p=True, to=30)
key = base64.b64decode(kb)
os.makedirs("/tmp/grab", exist_ok=True)
ts_path = f"/tmp/grab/{name}.ts"
t0 = time.time()
with open(ts_path, "wb") as f:
for i, su in enumerate(segs):
b64 = cdp.ev(JS_FETCH_B64 % json.dumps(su), await_p=True, to=90)
if isinstance(b64, str) and b64.startswith("ERR"):
raise RuntimeError(f"seg {i}: {b64}")
data = base64.b64decode(b64)
if key:
if meta["ivHex"]:
iv = bytes.fromhex(meta["ivHex"])
else:
iv = (seq + i).to_bytes(16, "big")
dec = AES.new(key, AES.MODE_CBC, iv).decrypt(data)
try: dec = unpad(dec, 16)
except ValueError: pass # ultimo segmento sin padding exacto
data = dec
f.write(data)
if i % 50 == 0:
el = time.time() - t0
print(f"[grab] {i}/{len(segs)} ({el:.0f}s)", flush=True)
cdp.close()
sz = os.path.getsize(ts_path)
print(f"[grab] .ts listo: {sz/1e9:.2f} GB en {time.time()-t0:.0f}s", flush=True)
# remux -> mkv directo a manual/movies
os.makedirs(DEST, exist_ok=True)
mkv = os.path.join(DEST, f"{name}.mkv")
r = subprocess.run(["ffmpeg", "-y", "-fflags", "+genpts", "-i", ts_path,
"-map", "0:v:0", "-map", "0:a:0", "-c", "copy", mkv],
capture_output=True, text=True)
if r.returncode != 0:
print(json.dumps({"status": "error", "error": "ffmpeg: " + r.stderr[-300:]})); return
os.remove(ts_path)
print(json.dumps({"status": "ok", "mkv": mkv, "size_gb": round(os.path.getsize(mkv)/1e9, 2),
"segments": len(segs)}))
if __name__ == "__main__":
main()