commit 79c2c8a80462a2c21a2dd3a23ca4250c13ebd268 Author: agent Date: Sat May 30 13:54:57 2026 +0200 feat: gnula_grabber 2-part pipeline (crawl ES + CDP stream download) diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..7a60b85 --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +__pycache__/ +*.pyc diff --git a/README.md b/README.md new file mode 100644 index 0000000..ceaa21a --- /dev/null +++ b/README.md @@ -0,0 +1,82 @@ +# Recipe: descarga automática castellano gnula → Jellyfin + +Flujo completo validado 2026-05-30. Para promover a pipeline del registry +(`download_gnula_movie_py_pipelines` o similar). Humano solo hace captcha si sale. + +## 0. Infra previa (una vez) +- Bridge NordVPN: `start_nordvpn_socks_bridge_bash_infra --port 8889` + (creds env NORDVPN_SOCKS_USER/PASS de `pass datardos-vps/nordvpn`). gost SOCKS5→HTTP local. +- Jellyfin realtime monitor ON en libs `media/movies`,`media/tv` (ya). +- systemd-user timer `popelis-import.timer` (3min) → `popelis_import_media_drop` (manual→media+metadata). +- Extensiones unpacked en Windows: `C:\Users\lucas\hls-dl-ext` (HLS+AES decrypt), `C:\Users\lucas\ubol` (uBlock Origin Lite MV3, mata ads/popunders). + +## 1. Lanzar Chrome (Windows, visible) por NordVPN + CDP + extensiones +``` +chrome.exe --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0 \ + --remote-allow-origins='*' \ + --user-data-dir='C:\Users\lucas\AppData\Local\fn-chrome-cdp-profile' \ + --proxy-server=http://127.0.0.1:8889 \ + --disable-extensions-except='C:\Users\lucas\hls-dl-ext,C:\Users\lucas\ubol' \ + --load-extension='C:\Users\lucas\hls-dl-ext,C:\Users\lucas\ubol' \ + --no-first-run --no-default-browser-check --new-window "https://www.gnularetro.cc/release/2025/" +``` +NOTA Chrome 148: `--load-extension` SOLO no carga las extensiones; hay que añadir +`--disable-extensions-except` con los mismos paths. Verificar con browser-ws Target.getTargets +(SW chrome-extension://) o chrome://extensions. +gnula está ISP-bloqueado (Allot) + Cloudflare → NordVPN salta ISP, navegador real pasa CF. + +## 2. Listado: detectar pelis con bandera España + dedup +CDP eval en la página release: por cada `a[href*="/ver-pelicula/"]`, mirar imgs `/flags/`. +- `es.png` = **España (Español/castellano)** — la que queremos. (En el LISTADO el alt-text está + invertido: es.png alt="Latino". IGNORAR alt, usar el nombre de imagen es.png. Confirmado en la + página de peli: el grupo es.png se etiqueta literalmente "Español".) +- `mx.png` = México/Latino. `jp.png` = Subtitulado/VOSE. +- Dedup: cruzar título vs `GET /api/v3/movie` de Radarr (key 63fb51c8...) — saltar los ya presentes. + +## 3. Página de la peli: elegir server Español +CDP: navegar al href. Las opciones de player son `li[data-nume]` (DooPlay): +- Grupo "Español" (es.png) = dnume bajos; "Subtitulado" (jp.png) = dnume altos. +- Servers: streamplay, **luluvid(=luluvdo)**, bigwarp, filemoon, dsvplay, powvideo, streamtape, vidmoly, voe. +- PREFERIR luluvid/luluvdo (sabemos descifrar su HLS AES-128). Clic: `li[data-nume="2"].click()` (luluvid Español). +- Carga iframe `lulustream.com/e/` → resuelve a `luluvdo.com/e/`. + +## 4. Play (CDP) — método que FUNCIONA (sin depender de uBlock) +luluvdo pone un **overlay ad click-catcher** `div` con `z-index>=1000` encima del +player: el `.click()` JS no es gesto válido y un clic normal abre popunder. Solución +validada (en el target del iframe luluvdo): +1. **Quitar overlays**: `document.querySelectorAll('div,iframe')` con `zIndex>=1000` → `.remove()` (quita ~4). +2. **Trusted click** en `.jw-icon-display`/video via `Input.dispatchMouseEvent` (coords del rect). +3. `jwplayer().play(true)`. +→ video.paused=false + master.m3u8 pedido. Captcha real (si sale) = humano. +- Cerrar popunders: `GET /json/list` → cerrar targets que no sean gnula/luluvdo (`/json/close/`). +- uBlock: `--load-extension` SOLO falló en Chrome 148 (ver gotcha); NO es necesario para play + (el overlay se quita in-page). Útil solo para reducir popunders generales. + +## 5. Descarga streaming + descifrado (NO blob → no peta Chrome) +`temp/grab_stream.py `: +- Conecta CDP al iframe player. +- Lee master.m3u8 (de performance entries) → variante → media playlist. +- AES-128: `#EXT-X-KEY` (key url + IV; sin IV → IV=media-sequence en 16B BE). +- Por segmento: in-page `fetch(seg)→arrayBuffer→base64` (sesión browser=pasa 522 CF) → CDP → + LOCAL: base64decode + `AES.new(key,CBC,iv).decrypt` + unpad PKCS7 + append a `/tmp/grab/.ts`. + (1 segmento en RAM a la vez → aguanta pelis grandes.) +- Remux `ffmpeg -c copy` .ts→.mkv directo a `F:\POPELIS\manual\movies\.mkv`. Borra .ts. + +## 6. Import + Jellyfin (automático) +- systemd timer (3min) corre `popelis_import_media_drop` → identifica (Radarr/TMDb) + mueve + manual/movies → media/movies con ficha → Jellyfin realtime monitor → aparece. +- (o lanzar import on-demand: `RADARR_KEY=.. SONARR_KEY=.. python popelis_import_media_drop.py --apply`) + +## Gotchas +- blob-en-memoria petaba Chrome con peli 3.4GB → usar grab_stream.py (streaming). +- Token master.m3u8 caduca ~8h + ligado a IP de salida → descargar por el MISMO proxy/sesión. +- 522 CF sólo afecta clientes externos (curl/ffmpeg/yt-dlp); la sesión viva del navegador = 200. +- Multi-audio urlset: el variant elegido trae 1 audio (el del grupo Español = castellano). +- chrome.downloads / a.click() blob cae en Downloads, NO respeta setDownloadBehavior → por eso + grab_stream.py escribe directo a disco vía CDP+local, evitando el download del navegador. +- **`--load-extension` falla en Chrome 148**: las extensiones unpacked (hls-dl-ext, ubol) NO + cargaron (Target.getTargets no lista sus SW + overlay ad seguía). Fix a probar: + añadir `--disable-extensions-except=''` junto a `--load-extension`. Pero NO + es necesario para el flujo: el overlay se neutraliza in-page (paso 4). uBOL Lite (MV3 DNR) + además NO mata bien el click-catcher inyectado por JS (limitado vs uBO full, que es MV2 y + Chrome 148 ya no admite). Conclusión: depender del paso-4 in-page, no de uBlock. diff --git a/app.md b/app.md new file mode 100644 index 0000000..bffd838 --- /dev/null +++ b/app.md @@ -0,0 +1,62 @@ +--- +name: gnula_grabber +lang: py +domain: infra +version: 0.1.0 +description: "Pipeline 2-partes: detecta pelis en castellano en gnula (crawler→catálogo SQLite) y las descarga via Chrome+CDP+NordVPN (HLS streaming+descifrado AES) → manual/movies → Radarr/Jellyfin. Captcha=humano." +tags: [gnula, hls, scraping, cdp, nordvpn, media, popelis, navegator] +uses_functions: + - extract_hls_from_cdp_tab_py_pipelines + - start_nordvpn_socks_bridge_bash_infra + - popelis_import_media_drop_py_infra +uses_types: [] +framework: "" +entry_point: "crawl.py" +dir_path: "apps/gnula_grabber" +repo_url: "" +--- + +## Propósito + +Sistema 2-partes para poblar la biblioteca Jellyfin con pelis en **castellano** desde gnula +(gnularetro.cc), saltando el bloqueo del ISP (DPI Allot, vía NordVPN) y Cloudflare (navegador real). + +## Parte 1 — Detector (`crawl.py`) +Crawler vía **FlareSolverr** (Chrome headless NordVPN+CF). Recorre listados/categorías, +parsea `
`, filtra los que tienen bandera **es.png** (= "Español" +dentro de la página; el alt-text del listado está invertido — usar el nombre de imagen), +dedup vs catálogo + Radarr, y guarda en SQLite `~/.config/popelis/gnula_catalog.db` +(tabla `movies`: href, title, year, flags, lang_es, status, in_library). +```bash +python3 apps/gnula_grabber/crawl.py "peliculas/estrenos" 2 +python3 apps/gnula_grabber/crawl.py "peliculas/accion" 2 # paginación rota -> recorrer categorías +``` + +## Parte 2 — Downloader (`download.py`) +Consume el catálogo (pelis `status=pending`). Por cada una: navega a su página en el Chrome +CDP, clica el server **Español** (luluvid/luluvdo, `li[data-nume]` del grupo es.png), quita el +overlay de ad (`div z-index>=1000`) + `jwplayer().play()` + trusted click, espera `master.m3u8` +(captcha=humano), y delega a `grab_stream.py`. Marca `downloaded`. +```bash +python3 apps/gnula_grabber/download.py 1 # baja 1 peli pending +``` + +## Núcleo — `grab_stream.py` +Descarga HLS **streaming a disco** (no blob → no peta Chrome): in-page fetchea cada segmento +crudo (sesión browser = pasa el 522 CF), lo manda por CDP, y LOCAL descifra AES-128-CBC +(key+IV de `#EXT-X-KEY`; IV=media-sequence) + append a `.ts` + remux `ffmpeg -c copy` → `.mkv` +directo a `F:\POPELIS\manual\movies`. El timer systemd `popelis-import.timer` lo importa +(metadata Radarr) → `media` → Jellyfin (realtime monitor). +```bash +python3 apps/gnula_grabber/grab_stream.py "Titulo (2025)" # con el player luluvdo reproduciendo +``` + +## Prerequisitos +- Bridge NordVPN (`start_nordvpn_socks_bridge --port 8889`). +- gluetun+FlareSolverr arriba (para el crawler). +- Chrome lanzado por CDP+NordVPN (ver README.md, lanzamiento canónico). +- venv con `pycryptodome` + `websocket-client` (vía `uv run --with`). + +## Gotchas +- Ver `README.md` para el flujo completo + gotchas (overlay ad, --load-extension Chrome 148, + token efímero, 522 solo-externos, paginación gnula rota → categorías). diff --git a/crawl.py b/crawl.py new file mode 100644 index 0000000..448dd25 --- /dev/null +++ b/crawl.py @@ -0,0 +1,100 @@ +#!/usr/bin/env python3 +"""PARTE 1 — Crawler gnula: detecta pelis en ESPAÑOL (bandera es.png) y las cataloga. + +Usa FlareSolverr (Chrome NordVPN+CF, headless) para fetchear las páginas listado +(gnula está ISP-bloqueado + Cloudflare). Parsea los
, +filtra los que tienen flag es.png (Español/castellano), dedup vs catálogo + Radarr, +y guarda en SQLite. NO descarga nada (eso es Parte 2: grab_stream.py). + +Uso: python gnula_crawl.py [base_path] [max_pages] + base_path: ej 'peliculas/estrenos' (default), 'release/2025', 'peliculas/accion'... +""" +import json, re, sqlite3, sys, time, urllib.request + +FLARE = "http://localhost:8191/v1" +SITE = "https://www.gnularetro.cc" +DB = "/home/lucas/.config/popelis/gnula_catalog.db" +RADARR = ("http://localhost:7878", "63fb51c8c95746e2a327740baac02f5e") + +def flare_get(url, timeout=150, retries=2): + for _a in range(retries+1): + try: + return _flare_get(url, timeout) + except Exception as e: + if _a==retries: raise + time.sleep(5) +def _flare_get(url, timeout=150): + body = json.dumps({"cmd": "request.get", "url": url, "maxTimeout": 60000}).encode() + req = urllib.request.Request(FLARE, data=body, headers={"Content-Type": "application/json"}) + resp = json.load(urllib.request.urlopen(req, timeout=timeout)) + sol = resp.get("solution", {}) + return sol.get("status"), sol.get("response", "") + +def parse_cards(html): + out = [] + for blk in re.findall(r']*class="[^"]*item movies[^"]*".*?
', html, re.S): + href = re.search(r'href="([^"]*ver-pelicula[^"]*)"', blk) + title = re.search(r']*>.*?]*>([^<]+)', blk, re.S) or re.search(r'title="([^"]+)"', blk) + flags = re.findall(r'/flags/(\w+\.png)', blk) + yr = re.search(r'(\d{4})', blk) or re.search(r'\b(20\d{2})\b', blk) + if href and title: + out.append({ + "href": href.group(1), + "title": title.group(1).strip(), + "year": int(yr.group(1)) if yr else None, + "flags": sorted(set(flags)), + "lang_es": 1 if any("es.png" == f for f in flags) else 0, + }) + return out + +def db_init(): + import os; os.makedirs("/home/lucas/.config/popelis", exist_ok=True) + c = sqlite3.connect(DB) + c.execute("""CREATE TABLE IF NOT EXISTS movies( + href TEXT PRIMARY KEY, title TEXT, year INTEGER, flags TEXT, + lang_es INTEGER, status TEXT DEFAULT 'pending', in_library INTEGER DEFAULT 0, + detected_at TEXT, downloaded_at TEXT)""") + c.commit(); return c + +def radarr_titles(): + try: + req = urllib.request.Request(f"{RADARR[0]}/api/v3/movie", headers={"X-Api-Key": RADARR[1]}) + return {m["title"].lower() for m in json.load(urllib.request.urlopen(req, timeout=20))} + except Exception: + return set() + +def main(): + base = sys.argv[1] if len(sys.argv) > 1 else "peliculas/estrenos" + max_pages = int(sys.argv[2]) if len(sys.argv) > 2 else 8 + c = db_init(); have = radarr_titles() + now = time.strftime("%Y-%m-%dT%H:%M:%S") + total_new = 0; total_es = 0 + for p in range(1, max_pages + 1): + url = f"{SITE}/{base}/" if p == 1 else f"{SITE}/{base}/page/{p}/" + st, html = flare_get(url) + if st != 200: + print(f"[crawl] page {p}: HTTP {st} -> stop"); break + cards = parse_cards(html) + if not cards: + print(f"[crawl] page {p}: 0 cards -> stop"); break + es = [x for x in cards if x["lang_es"]] + for x in es: + inlib = 1 if any(t in x["title"].lower() or x["title"].lower() in t for t in have) else 0 + c.execute("""INSERT INTO movies(href,title,year,flags,lang_es,status,in_library,detected_at) + VALUES(?,?,?,?,?,?,?,?) + ON CONFLICT(href) DO UPDATE SET flags=excluded.flags, in_library=excluded.in_library""", + (x["href"], x["title"], x["year"], ",".join(x["flags"]), 1, + "have" if inlib else "pending", inlib, now)) + c.commit() + total_es += len(es); total_new += len(cards) + print(f"[crawl] page {p}: {len(cards)} pelis, {len(es)} en español") + time.sleep(1) + # resumen + cur = c.execute("SELECT COUNT(*),SUM(in_library),SUM(status='pending') FROM movies WHERE lang_es=1") + tot, inlib, pend = cur.fetchone() + print(json.dumps({"crawled_pages_upto": max_pages, "es_seen": total_es, + "catalog_total_es": tot, "in_library": inlib or 0, + "pending_download": pend or 0, "db": DB})) + +if __name__ == "__main__": + main() diff --git a/download.py b/download.py new file mode 100644 index 0000000..a7723f5 --- /dev/null +++ b/download.py @@ -0,0 +1,119 @@ +#!/usr/bin/env python3 +"""PARTE 2 — Downloader: consume el catálogo (gnula_crawl.py) y descarga las pelis +español pendientes via la sesión Chrome+CDP+NordVPN. + +Por cada peli pending: navega a su página, clica el server Español (luluvid/luluvdo), +quita el overlay de ad + play (jwplayer), espera master.m3u8, y delega a grab_stream +(streaming + descifrado AES local) → manual/movies. Marca downloaded en el catálogo. +El captcha (si sale) lo resuelve el humano: el script espera a que aparezca master. + +Uso: python gnula_download.py [limit] (default 1 peli; usar N para varias) +Requiere: Chrome con --remote-debugging-port=9222 (NordVPN proxy) abierto. +""" +import json, re, sqlite3, subprocess, sys, time, urllib.request +import websocket + +PORT = 9222 +DB = "/home/lucas/.config/popelis/gnula_catalog.db" +GRAB = "/home/lucas/fn_registry/apps/gnula_grabber/grab_stream.py" + +def tabs(): return json.load(urllib.request.urlopen(f"http://127.0.0.1:{PORT}/json/list", timeout=8)) +def conn(t): + w = websocket.create_connection(t["webSocketDebuggerUrl"], timeout=20, header=["Origin: http://localhost"]); w.settimeout(3); return w +def ev(w, expr, to=15): + w.send(json.dumps({"id": 1, "method": "Runtime.evaluate", "params": {"expression": expr, "returnByValue": True}})) + s = time.time() + while time.time() - s < to: + try: m = json.loads(w.recv()) + except Exception: continue + if m.get("id") == 1: return m.get("result", {}).get("result", {}).get("value") +def cmd(w, method, params, to=8): + w.send(json.dumps({"id": 2, "method": method, "params": params})) + s = time.time() + while time.time() - s < to: + try: m = json.loads(w.recv()) + except Exception: continue + if m.get("id") == 2: return m.get("result", {}) + +def page_target(sub): + return next((t for t in tabs() if sub in t.get("url", "") and t.get("type") == "page"), None) +def player_target(): + return next((t for t in tabs() if any(h in t.get("url", "") for h in ["luluvdo", "lulustream"])), None) + +def safe_name(title, year): + base = re.split(r"\s*\|", title)[0].strip() + base = re.sub(r"\(20\d{2}\)", "", base).strip() + return re.sub(r"[^\w .-]", "", f"{base} ({year})" if year else base)[:90] + +def download_one(href, title, year): + name = safe_name(title, year) + print(f"[dl] {name} -> {href}", flush=True) + # 1. navegar a la página de la peli + g = next((t for t in tabs() if t.get("type") == "page"), None) + w = conn(g); ev(w, f"location.href={json.dumps(href)}"); w.close(); time.sleep(7) + # 2. clic server Español (dnume 2 = luluvid). Prefiere luluvid; si no, primer dnume del grupo es.png + pg = page_target("/ver-pelicula/") + if not pg: print("[dl] no movie page"); return False + w = conn(pg) + clicked = ev(w, r'''(()=>{ + const lis=[...document.querySelectorAll('li[data-nume]')]; + // grupo Español: li cuyo flag es es.png; preferir luluvid + const esLis=lis.filter(l=>[...l.querySelectorAll('img')].some(i=>/es\.png/.test(i.src||''))); + const pick=esLis.find(l=>/luluv/i.test(l.textContent||''))||esLis[0]||lis[0]; + if(!pick)return"no-options"; pick.click(); return"clicked:"+(pick.textContent||'').trim().slice(0,15); + })()'''); w.close() + print("[dl] server:", clicked); time.sleep(6) + # 3. play: quitar overlays + jwplayer.play + trusted click + pl = player_target() + for _ in range(4): + if pl: break + time.sleep(3); pl = player_target() + if not pl: print("[dl] no player iframe (captcha?)"); return False + w = conn(pl) + ev(w, r'''(()=>{let n=0;document.querySelectorAll('div,iframe').forEach(e=>{if(parseInt(getComputedStyle(e).zIndex||0)>=1000){e.remove();n++;}});return n;})()''') + rect = ev(w, r'''(()=>{const b=document.querySelector('.jw-icon-display,.jw-display-icon-container,video');if(!b)return null;const r=b.getBoundingClientRect();return Math.round(r.left+r.width/2)+","+Math.round(r.top+r.height/2);})()''') + if rect and "," in rect: + cx, cy = [int(x) for x in rect.split(",")] + for _ in range(2): + cmd(w, "Input.dispatchMouseEvent", {"type": "mousePressed", "x": cx, "y": cy, "button": "left", "clickCount": 1}) + cmd(w, "Input.dispatchMouseEvent", {"type": "mouseReleased", "x": cx, "y": cy, "button": "left", "clickCount": 1}) + time.sleep(2) + ev(w, r'''(()=>{try{jwplayer().play(true);}catch(e){}})()''') + # 4. esperar master (humano resuelve captcha si hace falta) hasta 120s + got = False + for _ in range(40): + ev(w, r'''(()=>{document.querySelectorAll('div,iframe').forEach(e=>{if(parseInt(getComputedStyle(e).zIndex||0)>=1000)e.remove();});})()''') + if ev(w, r'''performance.getEntriesByType('resource').some(e=>/master\.m3u8/.test(e.name))'''): + got = True; break + time.sleep(3) + w.close() + if not got: + print("[dl] master no apareció (captcha/no play). Skip."); return False + # 5. grab streaming + r = subprocess.run(["uv", "run", "--with", "websocket-client", "--with", "pycryptodome", + "python", GRAB, name], cwd="/home/lucas/fn_registry/apps/gnula_grabber", + capture_output=True, text=True, timeout=3600) + print(r.stdout[-400:]); + ok = '"status": "ok"' in r.stdout + return ok + +def main(): + limit = int(sys.argv[1]) if len(sys.argv) > 1 else 1 + c = sqlite3.connect(DB) + rows = c.execute("SELECT href,title,year FROM movies WHERE lang_es=1 AND status='pending' ORDER BY detected_at LIMIT ?", (limit,)).fetchall() + print(f"[dl] {len(rows)} pelis pending a descargar (limit {limit})") + done = 0 + for href, title, year in rows: + try: + ok = download_one(href, title, year) + except Exception as e: + print("[dl] error:", e); ok = False + if ok: + c.execute("UPDATE movies SET status='downloaded', downloaded_at=? WHERE href=?", + (time.strftime("%Y-%m-%dT%H:%M:%S"), href)); c.commit(); done += 1 + else: + c.execute("UPDATE movies SET status='failed' WHERE href=?", (href,)); c.commit() + print(json.dumps({"attempted": len(rows), "downloaded": done})) + +if __name__ == "__main__": + main() diff --git a/grab_stream.py b/grab_stream.py new file mode 100644 index 0000000..703e1b6 --- /dev/null +++ b/grab_stream.py @@ -0,0 +1,132 @@ +#!/usr/bin/env python3 +"""Descarga HLS en streaming a disco (descifra AES-128 LOCAL) via CDP. + +No usa Blob gigante en el navegador (eso petaba Chrome). En su lugar: +- in-page fetchea cada segmento CRUDO (sesion browser = pasa 522 CF) y lo + devuelve base64 por CDP. +- LOCAL: base64-decode + AES-128-CBC decrypt (key+IV) + append al .ts en disco. +Asi solo 1 segmento vive en memoria a la vez. Aguanta pelis de cualquier tamaño. + +Uso: python grab_stream.py [url_substring] +""" +import base64, json, os, subprocess, sys, time, urllib.request +import websocket # uv --with websocket-client +from Crypto.Cipher import AES # uv --with pycryptodome +from Crypto.Util.Padding import unpad + +PORT = 9222 +DEST = "/mnt/f/POPELIS/manual/movies" +PLAYER_HOSTS = ["luluvdo", "luluvid", "filemoon", "streamwish", "dood", "vidhide", + "bigwarp", "voe", "streamtape", "vidmoly", "cdn-tnmr", "/e/", "/embed"] + +def targets(): + return json.load(urllib.request.urlopen(f"http://127.0.0.1:{PORT}/json/list", timeout=8)) + +class CDP: + def __init__(self, ws_url): + self.ws = websocket.create_connection(ws_url, timeout=30, header=["Origin: http://localhost"]) + self.ws.settimeout(60); self.id = 0 + def ev(self, expr, await_p=False, to=120): + self.id += 1; mid = self.id + self.ws.send(json.dumps({"id": mid, "method": "Runtime.evaluate", + "params": {"expression": expr, "returnByValue": True, "awaitPromise": await_p}})) + s = time.time() + while time.time() - s < to: + try: m = json.loads(self.ws.recv()) + except Exception: continue + if m.get("id") == mid: + r = m.get("result", {}) + if "exceptionDetails" in r: + raise RuntimeError(json.dumps(r["exceptionDetails"])[:300]) + return r.get("result", {}).get("value") + raise TimeoutError("CDP eval timeout") + def close(self): + try: self.ws.close() + except Exception: pass + +def find_player(): + for t in targets(): + if t.get("type") in ("page", "iframe") and t.get("webSocketDebuggerUrl") \ + and any(h in t.get("url", "") for h in PLAYER_HOSTS): + return t + return None + +# JS: devuelve metadata del playlist (key url, segmentos, media-sequence, iv) +JS_META = r'''(async()=>{ + const abs=(b,u)=>new URL(u,b).href; + const gt=async u=>{const r=await fetch(u,{headers:{Referer:location.origin+"/"}});if(!r.ok)throw new Error("HTTP "+r.status);return r.text();}; + const masters=performance.getEntriesByType('resource').map(e=>e.name).filter(n=>/master\.m3u8/.test(n)); + if(!masters.length)return JSON.stringify({err:"no master (¿diste play?)"}); + let m=masters[0],t=await gt(m),pl=m; + if(/#EXT-X-STREAM-INF/.test(t)){const L=t.split("\n");for(let i=0;is.trim()).filter(s=>s&&!s.startsWith("#")).map(s=>abs(pl,s)); + return JSON.stringify({keyUrl:km?abs(pl,km[1]):null, ivHex:km&&km[2]?km[2]:null, seq, segs, master:m}); +})()''' + +# JS: fetch de un recurso -> base64 (chunked para binarios grandes) +JS_FETCH_B64 = '''(async()=>{ + const r=await fetch(%s,{headers:{Referer:location.origin+"/"}}); + if(!r.ok)return "ERR"+r.status; + const b=new Uint8Array(await r.arrayBuffer());let s="";const C=0x8000; + for(let i=0;i 1 else "video" + pl = find_player() + if not pl: + print(json.dumps({"status": "error", "error": "no player iframe (¿diste play?)"})); return + print(f"[grab] player: {pl['url'][:70]}", flush=True) + cdp = CDP(pl["webSocketDebuggerUrl"]) + meta = json.loads(cdp.ev(JS_META, await_p=True, to=60)) + if meta.get("err"): + print(json.dumps({"status": "error", "error": meta["err"]})); cdp.close(); return + segs = meta["segs"]; seq = meta["seq"] + print(f"[grab] {len(segs)} segmentos, key={'si' if meta['keyUrl'] else 'no'}", flush=True) + key = None + if meta["keyUrl"]: + kb = cdp.ev(JS_FETCH_B64 % json.dumps(meta["keyUrl"]), await_p=True, to=30) + key = base64.b64decode(kb) + os.makedirs("/tmp/grab", exist_ok=True) + ts_path = f"/tmp/grab/{name}.ts" + t0 = time.time() + with open(ts_path, "wb") as f: + for i, su in enumerate(segs): + b64 = cdp.ev(JS_FETCH_B64 % json.dumps(su), await_p=True, to=90) + if isinstance(b64, str) and b64.startswith("ERR"): + raise RuntimeError(f"seg {i}: {b64}") + data = base64.b64decode(b64) + if key: + if meta["ivHex"]: + iv = bytes.fromhex(meta["ivHex"]) + else: + iv = (seq + i).to_bytes(16, "big") + dec = AES.new(key, AES.MODE_CBC, iv).decrypt(data) + try: dec = unpad(dec, 16) + except ValueError: pass # ultimo segmento sin padding exacto + data = dec + f.write(data) + if i % 50 == 0: + el = time.time() - t0 + print(f"[grab] {i}/{len(segs)} ({el:.0f}s)", flush=True) + cdp.close() + sz = os.path.getsize(ts_path) + print(f"[grab] .ts listo: {sz/1e9:.2f} GB en {time.time()-t0:.0f}s", flush=True) + # remux -> mkv directo a manual/movies + os.makedirs(DEST, exist_ok=True) + mkv = os.path.join(DEST, f"{name}.mkv") + r = subprocess.run(["ffmpeg", "-y", "-fflags", "+genpts", "-i", ts_path, + "-map", "0:v:0", "-map", "0:a:0", "-c", "copy", mkv], + capture_output=True, text=True) + if r.returncode != 0: + print(json.dumps({"status": "error", "error": "ffmpeg: " + r.stderr[-300:]})); return + os.remove(ts_path) + print(json.dumps({"status": "ok", "mkv": mkv, "size_gb": round(os.path.getsize(mkv)/1e9, 2), + "segments": len(segs)})) + +if __name__ == "__main__": + main()