feat: gnula_grabber 2-part pipeline (crawl ES + CDP stream download)
This commit is contained in:
@@ -0,0 +1,2 @@
|
||||
__pycache__/
|
||||
*.pyc
|
||||
@@ -0,0 +1,82 @@
|
||||
# Recipe: descarga automática castellano gnula → Jellyfin
|
||||
|
||||
Flujo completo validado 2026-05-30. Para promover a pipeline del registry
|
||||
(`download_gnula_movie_py_pipelines` o similar). Humano solo hace captcha si sale.
|
||||
|
||||
## 0. Infra previa (una vez)
|
||||
- Bridge NordVPN: `start_nordvpn_socks_bridge_bash_infra --port 8889`
|
||||
(creds env NORDVPN_SOCKS_USER/PASS de `pass datardos-vps/nordvpn`). gost SOCKS5→HTTP local.
|
||||
- Jellyfin realtime monitor ON en libs `media/movies`,`media/tv` (ya).
|
||||
- systemd-user timer `popelis-import.timer` (3min) → `popelis_import_media_drop` (manual→media+metadata).
|
||||
- Extensiones unpacked en Windows: `C:\Users\lucas\hls-dl-ext` (HLS+AES decrypt), `C:\Users\lucas\ubol` (uBlock Origin Lite MV3, mata ads/popunders).
|
||||
|
||||
## 1. Lanzar Chrome (Windows, visible) por NordVPN + CDP + extensiones
|
||||
```
|
||||
chrome.exe --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0 \
|
||||
--remote-allow-origins='*' \
|
||||
--user-data-dir='C:\Users\lucas\AppData\Local\fn-chrome-cdp-profile' \
|
||||
--proxy-server=http://127.0.0.1:8889 \
|
||||
--disable-extensions-except='C:\Users\lucas\hls-dl-ext,C:\Users\lucas\ubol' \
|
||||
--load-extension='C:\Users\lucas\hls-dl-ext,C:\Users\lucas\ubol' \
|
||||
--no-first-run --no-default-browser-check --new-window "https://www.gnularetro.cc/release/2025/"
|
||||
```
|
||||
NOTA Chrome 148: `--load-extension` SOLO no carga las extensiones; hay que añadir
|
||||
`--disable-extensions-except` con los mismos paths. Verificar con browser-ws Target.getTargets
|
||||
(SW chrome-extension://) o chrome://extensions.
|
||||
gnula está ISP-bloqueado (Allot) + Cloudflare → NordVPN salta ISP, navegador real pasa CF.
|
||||
|
||||
## 2. Listado: detectar pelis con bandera España + dedup
|
||||
CDP eval en la página release: por cada `a[href*="/ver-pelicula/"]`, mirar imgs `/flags/`.
|
||||
- `es.png` = **España (Español/castellano)** — la que queremos. (En el LISTADO el alt-text está
|
||||
invertido: es.png alt="Latino". IGNORAR alt, usar el nombre de imagen es.png. Confirmado en la
|
||||
página de peli: el grupo es.png se etiqueta literalmente "Español".)
|
||||
- `mx.png` = México/Latino. `jp.png` = Subtitulado/VOSE.
|
||||
- Dedup: cruzar título vs `GET /api/v3/movie` de Radarr (key 63fb51c8...) — saltar los ya presentes.
|
||||
|
||||
## 3. Página de la peli: elegir server Español
|
||||
CDP: navegar al href. Las opciones de player son `li[data-nume]` (DooPlay):
|
||||
- Grupo "Español" (es.png) = dnume bajos; "Subtitulado" (jp.png) = dnume altos.
|
||||
- Servers: streamplay, **luluvid(=luluvdo)**, bigwarp, filemoon, dsvplay, powvideo, streamtape, vidmoly, voe.
|
||||
- PREFERIR luluvid/luluvdo (sabemos descifrar su HLS AES-128). Clic: `li[data-nume="2"].click()` (luluvid Español).
|
||||
- Carga iframe `lulustream.com/e/<id>` → resuelve a `luluvdo.com/e/<id>`.
|
||||
|
||||
## 4. Play (CDP) — método que FUNCIONA (sin depender de uBlock)
|
||||
luluvdo pone un **overlay ad click-catcher** `div` con `z-index>=1000` encima del
|
||||
player: el `.click()` JS no es gesto válido y un clic normal abre popunder. Solución
|
||||
validada (en el target del iframe luluvdo):
|
||||
1. **Quitar overlays**: `document.querySelectorAll('div,iframe')` con `zIndex>=1000` → `.remove()` (quita ~4).
|
||||
2. **Trusted click** en `.jw-icon-display`/video via `Input.dispatchMouseEvent` (coords del rect).
|
||||
3. `jwplayer().play(true)`.
|
||||
→ video.paused=false + master.m3u8 pedido. Captcha real (si sale) = humano.
|
||||
- Cerrar popunders: `GET /json/list` → cerrar targets que no sean gnula/luluvdo (`/json/close/<id>`).
|
||||
- uBlock: `--load-extension` SOLO falló en Chrome 148 (ver gotcha); NO es necesario para play
|
||||
(el overlay se quita in-page). Útil solo para reducir popunders generales.
|
||||
|
||||
## 5. Descarga streaming + descifrado (NO blob → no peta Chrome)
|
||||
`temp/grab_stream.py <basename>`:
|
||||
- Conecta CDP al iframe player.
|
||||
- Lee master.m3u8 (de performance entries) → variante → media playlist.
|
||||
- AES-128: `#EXT-X-KEY` (key url + IV; sin IV → IV=media-sequence en 16B BE).
|
||||
- Por segmento: in-page `fetch(seg)→arrayBuffer→base64` (sesión browser=pasa 522 CF) → CDP →
|
||||
LOCAL: base64decode + `AES.new(key,CBC,iv).decrypt` + unpad PKCS7 + append a `/tmp/grab/<n>.ts`.
|
||||
(1 segmento en RAM a la vez → aguanta pelis grandes.)
|
||||
- Remux `ffmpeg -c copy` .ts→.mkv directo a `F:\POPELIS\manual\movies\<basename>.mkv`. Borra .ts.
|
||||
|
||||
## 6. Import + Jellyfin (automático)
|
||||
- systemd timer (3min) corre `popelis_import_media_drop` → identifica (Radarr/TMDb) + mueve
|
||||
manual/movies → media/movies con ficha → Jellyfin realtime monitor → aparece.
|
||||
- (o lanzar import on-demand: `RADARR_KEY=.. SONARR_KEY=.. python popelis_import_media_drop.py --apply`)
|
||||
|
||||
## Gotchas
|
||||
- blob-en-memoria petaba Chrome con peli 3.4GB → usar grab_stream.py (streaming).
|
||||
- Token master.m3u8 caduca ~8h + ligado a IP de salida → descargar por el MISMO proxy/sesión.
|
||||
- 522 CF sólo afecta clientes externos (curl/ffmpeg/yt-dlp); la sesión viva del navegador = 200.
|
||||
- Multi-audio urlset: el variant elegido trae 1 audio (el del grupo Español = castellano).
|
||||
- chrome.downloads / a.click() blob cae en Downloads, NO respeta setDownloadBehavior → por eso
|
||||
grab_stream.py escribe directo a disco vía CDP+local, evitando el download del navegador.
|
||||
- **`--load-extension` falla en Chrome 148**: las extensiones unpacked (hls-dl-ext, ubol) NO
|
||||
cargaron (Target.getTargets no lista sus SW + overlay ad seguía). Fix a probar:
|
||||
añadir `--disable-extensions-except='<mismos paths>'` junto a `--load-extension`. Pero NO
|
||||
es necesario para el flujo: el overlay se neutraliza in-page (paso 4). uBOL Lite (MV3 DNR)
|
||||
además NO mata bien el click-catcher inyectado por JS (limitado vs uBO full, que es MV2 y
|
||||
Chrome 148 ya no admite). Conclusión: depender del paso-4 in-page, no de uBlock.
|
||||
@@ -0,0 +1,62 @@
|
||||
---
|
||||
name: gnula_grabber
|
||||
lang: py
|
||||
domain: infra
|
||||
version: 0.1.0
|
||||
description: "Pipeline 2-partes: detecta pelis en castellano en gnula (crawler→catálogo SQLite) y las descarga via Chrome+CDP+NordVPN (HLS streaming+descifrado AES) → manual/movies → Radarr/Jellyfin. Captcha=humano."
|
||||
tags: [gnula, hls, scraping, cdp, nordvpn, media, popelis, navegator]
|
||||
uses_functions:
|
||||
- extract_hls_from_cdp_tab_py_pipelines
|
||||
- start_nordvpn_socks_bridge_bash_infra
|
||||
- popelis_import_media_drop_py_infra
|
||||
uses_types: []
|
||||
framework: ""
|
||||
entry_point: "crawl.py"
|
||||
dir_path: "apps/gnula_grabber"
|
||||
repo_url: ""
|
||||
---
|
||||
|
||||
## Propósito
|
||||
|
||||
Sistema 2-partes para poblar la biblioteca Jellyfin con pelis en **castellano** desde gnula
|
||||
(gnularetro.cc), saltando el bloqueo del ISP (DPI Allot, vía NordVPN) y Cloudflare (navegador real).
|
||||
|
||||
## Parte 1 — Detector (`crawl.py`)
|
||||
Crawler vía **FlareSolverr** (Chrome headless NordVPN+CF). Recorre listados/categorías,
|
||||
parsea `<article class="item movies">`, filtra los que tienen bandera **es.png** (= "Español"
|
||||
dentro de la página; el alt-text del listado está invertido — usar el nombre de imagen),
|
||||
dedup vs catálogo + Radarr, y guarda en SQLite `~/.config/popelis/gnula_catalog.db`
|
||||
(tabla `movies`: href, title, year, flags, lang_es, status, in_library).
|
||||
```bash
|
||||
python3 apps/gnula_grabber/crawl.py "peliculas/estrenos" 2
|
||||
python3 apps/gnula_grabber/crawl.py "peliculas/accion" 2 # paginación rota -> recorrer categorías
|
||||
```
|
||||
|
||||
## Parte 2 — Downloader (`download.py`)
|
||||
Consume el catálogo (pelis `status=pending`). Por cada una: navega a su página en el Chrome
|
||||
CDP, clica el server **Español** (luluvid/luluvdo, `li[data-nume]` del grupo es.png), quita el
|
||||
overlay de ad (`div z-index>=1000`) + `jwplayer().play()` + trusted click, espera `master.m3u8`
|
||||
(captcha=humano), y delega a `grab_stream.py`. Marca `downloaded`.
|
||||
```bash
|
||||
python3 apps/gnula_grabber/download.py 1 # baja 1 peli pending
|
||||
```
|
||||
|
||||
## Núcleo — `grab_stream.py`
|
||||
Descarga HLS **streaming a disco** (no blob → no peta Chrome): in-page fetchea cada segmento
|
||||
crudo (sesión browser = pasa el 522 CF), lo manda por CDP, y LOCAL descifra AES-128-CBC
|
||||
(key+IV de `#EXT-X-KEY`; IV=media-sequence) + append a `.ts` + remux `ffmpeg -c copy` → `.mkv`
|
||||
directo a `F:\POPELIS\manual\movies`. El timer systemd `popelis-import.timer` lo importa
|
||||
(metadata Radarr) → `media` → Jellyfin (realtime monitor).
|
||||
```bash
|
||||
python3 apps/gnula_grabber/grab_stream.py "Titulo (2025)" # con el player luluvdo reproduciendo
|
||||
```
|
||||
|
||||
## Prerequisitos
|
||||
- Bridge NordVPN (`start_nordvpn_socks_bridge --port 8889`).
|
||||
- gluetun+FlareSolverr arriba (para el crawler).
|
||||
- Chrome lanzado por CDP+NordVPN (ver README.md, lanzamiento canónico).
|
||||
- venv con `pycryptodome` + `websocket-client` (vía `uv run --with`).
|
||||
|
||||
## Gotchas
|
||||
- Ver `README.md` para el flujo completo + gotchas (overlay ad, --load-extension Chrome 148,
|
||||
token efímero, 522 solo-externos, paginación gnula rota → categorías).
|
||||
@@ -0,0 +1,100 @@
|
||||
#!/usr/bin/env python3
|
||||
"""PARTE 1 — Crawler gnula: detecta pelis en ESPAÑOL (bandera es.png) y las cataloga.
|
||||
|
||||
Usa FlareSolverr (Chrome NordVPN+CF, headless) para fetchear las páginas listado
|
||||
(gnula está ISP-bloqueado + Cloudflare). Parsea los <article class="item movies">,
|
||||
filtra los que tienen flag es.png (Español/castellano), dedup vs catálogo + Radarr,
|
||||
y guarda en SQLite. NO descarga nada (eso es Parte 2: grab_stream.py).
|
||||
|
||||
Uso: python gnula_crawl.py [base_path] [max_pages]
|
||||
base_path: ej 'peliculas/estrenos' (default), 'release/2025', 'peliculas/accion'...
|
||||
"""
|
||||
import json, re, sqlite3, sys, time, urllib.request
|
||||
|
||||
FLARE = "http://localhost:8191/v1"
|
||||
SITE = "https://www.gnularetro.cc"
|
||||
DB = "/home/lucas/.config/popelis/gnula_catalog.db"
|
||||
RADARR = ("http://localhost:7878", "63fb51c8c95746e2a327740baac02f5e")
|
||||
|
||||
def flare_get(url, timeout=150, retries=2):
|
||||
for _a in range(retries+1):
|
||||
try:
|
||||
return _flare_get(url, timeout)
|
||||
except Exception as e:
|
||||
if _a==retries: raise
|
||||
time.sleep(5)
|
||||
def _flare_get(url, timeout=150):
|
||||
body = json.dumps({"cmd": "request.get", "url": url, "maxTimeout": 60000}).encode()
|
||||
req = urllib.request.Request(FLARE, data=body, headers={"Content-Type": "application/json"})
|
||||
resp = json.load(urllib.request.urlopen(req, timeout=timeout))
|
||||
sol = resp.get("solution", {})
|
||||
return sol.get("status"), sol.get("response", "")
|
||||
|
||||
def parse_cards(html):
|
||||
out = []
|
||||
for blk in re.findall(r'<article[^>]*class="[^"]*item movies[^"]*".*?</article>', html, re.S):
|
||||
href = re.search(r'href="([^"]*ver-pelicula[^"]*)"', blk)
|
||||
title = re.search(r'<h3[^>]*>.*?<a[^>]*>([^<]+)', blk, re.S) or re.search(r'title="([^"]+)"', blk)
|
||||
flags = re.findall(r'/flags/(\w+\.png)', blk)
|
||||
yr = re.search(r'<span>(\d{4})</span>', blk) or re.search(r'\b(20\d{2})\b', blk)
|
||||
if href and title:
|
||||
out.append({
|
||||
"href": href.group(1),
|
||||
"title": title.group(1).strip(),
|
||||
"year": int(yr.group(1)) if yr else None,
|
||||
"flags": sorted(set(flags)),
|
||||
"lang_es": 1 if any("es.png" == f for f in flags) else 0,
|
||||
})
|
||||
return out
|
||||
|
||||
def db_init():
|
||||
import os; os.makedirs("/home/lucas/.config/popelis", exist_ok=True)
|
||||
c = sqlite3.connect(DB)
|
||||
c.execute("""CREATE TABLE IF NOT EXISTS movies(
|
||||
href TEXT PRIMARY KEY, title TEXT, year INTEGER, flags TEXT,
|
||||
lang_es INTEGER, status TEXT DEFAULT 'pending', in_library INTEGER DEFAULT 0,
|
||||
detected_at TEXT, downloaded_at TEXT)""")
|
||||
c.commit(); return c
|
||||
|
||||
def radarr_titles():
|
||||
try:
|
||||
req = urllib.request.Request(f"{RADARR[0]}/api/v3/movie", headers={"X-Api-Key": RADARR[1]})
|
||||
return {m["title"].lower() for m in json.load(urllib.request.urlopen(req, timeout=20))}
|
||||
except Exception:
|
||||
return set()
|
||||
|
||||
def main():
|
||||
base = sys.argv[1] if len(sys.argv) > 1 else "peliculas/estrenos"
|
||||
max_pages = int(sys.argv[2]) if len(sys.argv) > 2 else 8
|
||||
c = db_init(); have = radarr_titles()
|
||||
now = time.strftime("%Y-%m-%dT%H:%M:%S")
|
||||
total_new = 0; total_es = 0
|
||||
for p in range(1, max_pages + 1):
|
||||
url = f"{SITE}/{base}/" if p == 1 else f"{SITE}/{base}/page/{p}/"
|
||||
st, html = flare_get(url)
|
||||
if st != 200:
|
||||
print(f"[crawl] page {p}: HTTP {st} -> stop"); break
|
||||
cards = parse_cards(html)
|
||||
if not cards:
|
||||
print(f"[crawl] page {p}: 0 cards -> stop"); break
|
||||
es = [x for x in cards if x["lang_es"]]
|
||||
for x in es:
|
||||
inlib = 1 if any(t in x["title"].lower() or x["title"].lower() in t for t in have) else 0
|
||||
c.execute("""INSERT INTO movies(href,title,year,flags,lang_es,status,in_library,detected_at)
|
||||
VALUES(?,?,?,?,?,?,?,?)
|
||||
ON CONFLICT(href) DO UPDATE SET flags=excluded.flags, in_library=excluded.in_library""",
|
||||
(x["href"], x["title"], x["year"], ",".join(x["flags"]), 1,
|
||||
"have" if inlib else "pending", inlib, now))
|
||||
c.commit()
|
||||
total_es += len(es); total_new += len(cards)
|
||||
print(f"[crawl] page {p}: {len(cards)} pelis, {len(es)} en español")
|
||||
time.sleep(1)
|
||||
# resumen
|
||||
cur = c.execute("SELECT COUNT(*),SUM(in_library),SUM(status='pending') FROM movies WHERE lang_es=1")
|
||||
tot, inlib, pend = cur.fetchone()
|
||||
print(json.dumps({"crawled_pages_upto": max_pages, "es_seen": total_es,
|
||||
"catalog_total_es": tot, "in_library": inlib or 0,
|
||||
"pending_download": pend or 0, "db": DB}))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
+119
@@ -0,0 +1,119 @@
|
||||
#!/usr/bin/env python3
|
||||
"""PARTE 2 — Downloader: consume el catálogo (gnula_crawl.py) y descarga las pelis
|
||||
español pendientes via la sesión Chrome+CDP+NordVPN.
|
||||
|
||||
Por cada peli pending: navega a su página, clica el server Español (luluvid/luluvdo),
|
||||
quita el overlay de ad + play (jwplayer), espera master.m3u8, y delega a grab_stream
|
||||
(streaming + descifrado AES local) → manual/movies. Marca downloaded en el catálogo.
|
||||
El captcha (si sale) lo resuelve el humano: el script espera a que aparezca master.
|
||||
|
||||
Uso: python gnula_download.py [limit] (default 1 peli; usar N para varias)
|
||||
Requiere: Chrome con --remote-debugging-port=9222 (NordVPN proxy) abierto.
|
||||
"""
|
||||
import json, re, sqlite3, subprocess, sys, time, urllib.request
|
||||
import websocket
|
||||
|
||||
PORT = 9222
|
||||
DB = "/home/lucas/.config/popelis/gnula_catalog.db"
|
||||
GRAB = "/home/lucas/fn_registry/apps/gnula_grabber/grab_stream.py"
|
||||
|
||||
def tabs(): return json.load(urllib.request.urlopen(f"http://127.0.0.1:{PORT}/json/list", timeout=8))
|
||||
def conn(t):
|
||||
w = websocket.create_connection(t["webSocketDebuggerUrl"], timeout=20, header=["Origin: http://localhost"]); w.settimeout(3); return w
|
||||
def ev(w, expr, to=15):
|
||||
w.send(json.dumps({"id": 1, "method": "Runtime.evaluate", "params": {"expression": expr, "returnByValue": True}}))
|
||||
s = time.time()
|
||||
while time.time() - s < to:
|
||||
try: m = json.loads(w.recv())
|
||||
except Exception: continue
|
||||
if m.get("id") == 1: return m.get("result", {}).get("result", {}).get("value")
|
||||
def cmd(w, method, params, to=8):
|
||||
w.send(json.dumps({"id": 2, "method": method, "params": params}))
|
||||
s = time.time()
|
||||
while time.time() - s < to:
|
||||
try: m = json.loads(w.recv())
|
||||
except Exception: continue
|
||||
if m.get("id") == 2: return m.get("result", {})
|
||||
|
||||
def page_target(sub):
|
||||
return next((t for t in tabs() if sub in t.get("url", "") and t.get("type") == "page"), None)
|
||||
def player_target():
|
||||
return next((t for t in tabs() if any(h in t.get("url", "") for h in ["luluvdo", "lulustream"])), None)
|
||||
|
||||
def safe_name(title, year):
|
||||
base = re.split(r"\s*\|", title)[0].strip()
|
||||
base = re.sub(r"\(20\d{2}\)", "", base).strip()
|
||||
return re.sub(r"[^\w .-]", "", f"{base} ({year})" if year else base)[:90]
|
||||
|
||||
def download_one(href, title, year):
|
||||
name = safe_name(title, year)
|
||||
print(f"[dl] {name} -> {href}", flush=True)
|
||||
# 1. navegar a la página de la peli
|
||||
g = next((t for t in tabs() if t.get("type") == "page"), None)
|
||||
w = conn(g); ev(w, f"location.href={json.dumps(href)}"); w.close(); time.sleep(7)
|
||||
# 2. clic server Español (dnume 2 = luluvid). Prefiere luluvid; si no, primer dnume del grupo es.png
|
||||
pg = page_target("/ver-pelicula/")
|
||||
if not pg: print("[dl] no movie page"); return False
|
||||
w = conn(pg)
|
||||
clicked = ev(w, r'''(()=>{
|
||||
const lis=[...document.querySelectorAll('li[data-nume]')];
|
||||
// grupo Español: li cuyo flag es es.png; preferir luluvid
|
||||
const esLis=lis.filter(l=>[...l.querySelectorAll('img')].some(i=>/es\.png/.test(i.src||'')));
|
||||
const pick=esLis.find(l=>/luluv/i.test(l.textContent||''))||esLis[0]||lis[0];
|
||||
if(!pick)return"no-options"; pick.click(); return"clicked:"+(pick.textContent||'').trim().slice(0,15);
|
||||
})()'''); w.close()
|
||||
print("[dl] server:", clicked); time.sleep(6)
|
||||
# 3. play: quitar overlays + jwplayer.play + trusted click
|
||||
pl = player_target()
|
||||
for _ in range(4):
|
||||
if pl: break
|
||||
time.sleep(3); pl = player_target()
|
||||
if not pl: print("[dl] no player iframe (captcha?)"); return False
|
||||
w = conn(pl)
|
||||
ev(w, r'''(()=>{let n=0;document.querySelectorAll('div,iframe').forEach(e=>{if(parseInt(getComputedStyle(e).zIndex||0)>=1000){e.remove();n++;}});return n;})()''')
|
||||
rect = ev(w, r'''(()=>{const b=document.querySelector('.jw-icon-display,.jw-display-icon-container,video');if(!b)return null;const r=b.getBoundingClientRect();return Math.round(r.left+r.width/2)+","+Math.round(r.top+r.height/2);})()''')
|
||||
if rect and "," in rect:
|
||||
cx, cy = [int(x) for x in rect.split(",")]
|
||||
for _ in range(2):
|
||||
cmd(w, "Input.dispatchMouseEvent", {"type": "mousePressed", "x": cx, "y": cy, "button": "left", "clickCount": 1})
|
||||
cmd(w, "Input.dispatchMouseEvent", {"type": "mouseReleased", "x": cx, "y": cy, "button": "left", "clickCount": 1})
|
||||
time.sleep(2)
|
||||
ev(w, r'''(()=>{try{jwplayer().play(true);}catch(e){}})()''')
|
||||
# 4. esperar master (humano resuelve captcha si hace falta) hasta 120s
|
||||
got = False
|
||||
for _ in range(40):
|
||||
ev(w, r'''(()=>{document.querySelectorAll('div,iframe').forEach(e=>{if(parseInt(getComputedStyle(e).zIndex||0)>=1000)e.remove();});})()''')
|
||||
if ev(w, r'''performance.getEntriesByType('resource').some(e=>/master\.m3u8/.test(e.name))'''):
|
||||
got = True; break
|
||||
time.sleep(3)
|
||||
w.close()
|
||||
if not got:
|
||||
print("[dl] master no apareció (captcha/no play). Skip."); return False
|
||||
# 5. grab streaming
|
||||
r = subprocess.run(["uv", "run", "--with", "websocket-client", "--with", "pycryptodome",
|
||||
"python", GRAB, name], cwd="/home/lucas/fn_registry/apps/gnula_grabber",
|
||||
capture_output=True, text=True, timeout=3600)
|
||||
print(r.stdout[-400:]);
|
||||
ok = '"status": "ok"' in r.stdout
|
||||
return ok
|
||||
|
||||
def main():
|
||||
limit = int(sys.argv[1]) if len(sys.argv) > 1 else 1
|
||||
c = sqlite3.connect(DB)
|
||||
rows = c.execute("SELECT href,title,year FROM movies WHERE lang_es=1 AND status='pending' ORDER BY detected_at LIMIT ?", (limit,)).fetchall()
|
||||
print(f"[dl] {len(rows)} pelis pending a descargar (limit {limit})")
|
||||
done = 0
|
||||
for href, title, year in rows:
|
||||
try:
|
||||
ok = download_one(href, title, year)
|
||||
except Exception as e:
|
||||
print("[dl] error:", e); ok = False
|
||||
if ok:
|
||||
c.execute("UPDATE movies SET status='downloaded', downloaded_at=? WHERE href=?",
|
||||
(time.strftime("%Y-%m-%dT%H:%M:%S"), href)); c.commit(); done += 1
|
||||
else:
|
||||
c.execute("UPDATE movies SET status='failed' WHERE href=?", (href,)); c.commit()
|
||||
print(json.dumps({"attempted": len(rows), "downloaded": done}))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
+132
@@ -0,0 +1,132 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Descarga HLS en streaming a disco (descifra AES-128 LOCAL) via CDP.
|
||||
|
||||
No usa Blob gigante en el navegador (eso petaba Chrome). En su lugar:
|
||||
- in-page fetchea cada segmento CRUDO (sesion browser = pasa 522 CF) y lo
|
||||
devuelve base64 por CDP.
|
||||
- LOCAL: base64-decode + AES-128-CBC decrypt (key+IV) + append al .ts en disco.
|
||||
Asi solo 1 segmento vive en memoria a la vez. Aguanta pelis de cualquier tamaño.
|
||||
|
||||
Uso: python grab_stream.py <out_basename> [url_substring]
|
||||
"""
|
||||
import base64, json, os, subprocess, sys, time, urllib.request
|
||||
import websocket # uv --with websocket-client
|
||||
from Crypto.Cipher import AES # uv --with pycryptodome
|
||||
from Crypto.Util.Padding import unpad
|
||||
|
||||
PORT = 9222
|
||||
DEST = "/mnt/f/POPELIS/manual/movies"
|
||||
PLAYER_HOSTS = ["luluvdo", "luluvid", "filemoon", "streamwish", "dood", "vidhide",
|
||||
"bigwarp", "voe", "streamtape", "vidmoly", "cdn-tnmr", "/e/", "/embed"]
|
||||
|
||||
def targets():
|
||||
return json.load(urllib.request.urlopen(f"http://127.0.0.1:{PORT}/json/list", timeout=8))
|
||||
|
||||
class CDP:
|
||||
def __init__(self, ws_url):
|
||||
self.ws = websocket.create_connection(ws_url, timeout=30, header=["Origin: http://localhost"])
|
||||
self.ws.settimeout(60); self.id = 0
|
||||
def ev(self, expr, await_p=False, to=120):
|
||||
self.id += 1; mid = self.id
|
||||
self.ws.send(json.dumps({"id": mid, "method": "Runtime.evaluate",
|
||||
"params": {"expression": expr, "returnByValue": True, "awaitPromise": await_p}}))
|
||||
s = time.time()
|
||||
while time.time() - s < to:
|
||||
try: m = json.loads(self.ws.recv())
|
||||
except Exception: continue
|
||||
if m.get("id") == mid:
|
||||
r = m.get("result", {})
|
||||
if "exceptionDetails" in r:
|
||||
raise RuntimeError(json.dumps(r["exceptionDetails"])[:300])
|
||||
return r.get("result", {}).get("value")
|
||||
raise TimeoutError("CDP eval timeout")
|
||||
def close(self):
|
||||
try: self.ws.close()
|
||||
except Exception: pass
|
||||
|
||||
def find_player():
|
||||
for t in targets():
|
||||
if t.get("type") in ("page", "iframe") and t.get("webSocketDebuggerUrl") \
|
||||
and any(h in t.get("url", "") for h in PLAYER_HOSTS):
|
||||
return t
|
||||
return None
|
||||
|
||||
# JS: devuelve metadata del playlist (key url, segmentos, media-sequence, iv)
|
||||
JS_META = r'''(async()=>{
|
||||
const abs=(b,u)=>new URL(u,b).href;
|
||||
const gt=async u=>{const r=await fetch(u,{headers:{Referer:location.origin+"/"}});if(!r.ok)throw new Error("HTTP "+r.status);return r.text();};
|
||||
const masters=performance.getEntriesByType('resource').map(e=>e.name).filter(n=>/master\.m3u8/.test(n));
|
||||
if(!masters.length)return JSON.stringify({err:"no master (¿diste play?)"});
|
||||
let m=masters[0],t=await gt(m),pl=m;
|
||||
if(/#EXT-X-STREAM-INF/.test(t)){const L=t.split("\n");for(let i=0;i<L.length;i++)if(L[i].startsWith("#EXT-X-STREAM-INF")){pl=abs(m,L[i+1].trim());break;}}
|
||||
const media=await gt(pl);
|
||||
const km=media.match(/#EXT-X-KEY:METHOD=AES-128,URI="([^"]+)"(?:,IV=0x([0-9A-Fa-f]+))?/);
|
||||
let seq=1;const ms=media.match(/#EXT-X-MEDIA-SEQUENCE:(\d+)/);if(ms)seq=parseInt(ms[1]);
|
||||
const segs=media.split("\n").map(s=>s.trim()).filter(s=>s&&!s.startsWith("#")).map(s=>abs(pl,s));
|
||||
return JSON.stringify({keyUrl:km?abs(pl,km[1]):null, ivHex:km&&km[2]?km[2]:null, seq, segs, master:m});
|
||||
})()'''
|
||||
|
||||
# JS: fetch de un recurso -> base64 (chunked para binarios grandes)
|
||||
JS_FETCH_B64 = '''(async()=>{
|
||||
const r=await fetch(%s,{headers:{Referer:location.origin+"/"}});
|
||||
if(!r.ok)return "ERR"+r.status;
|
||||
const b=new Uint8Array(await r.arrayBuffer());let s="";const C=0x8000;
|
||||
for(let i=0;i<b.length;i+=C)s+=String.fromCharCode.apply(null,b.subarray(i,i+C));
|
||||
return btoa(s);
|
||||
})()'''
|
||||
|
||||
def main():
|
||||
name = sys.argv[1] if len(sys.argv) > 1 else "video"
|
||||
pl = find_player()
|
||||
if not pl:
|
||||
print(json.dumps({"status": "error", "error": "no player iframe (¿diste play?)"})); return
|
||||
print(f"[grab] player: {pl['url'][:70]}", flush=True)
|
||||
cdp = CDP(pl["webSocketDebuggerUrl"])
|
||||
meta = json.loads(cdp.ev(JS_META, await_p=True, to=60))
|
||||
if meta.get("err"):
|
||||
print(json.dumps({"status": "error", "error": meta["err"]})); cdp.close(); return
|
||||
segs = meta["segs"]; seq = meta["seq"]
|
||||
print(f"[grab] {len(segs)} segmentos, key={'si' if meta['keyUrl'] else 'no'}", flush=True)
|
||||
key = None
|
||||
if meta["keyUrl"]:
|
||||
kb = cdp.ev(JS_FETCH_B64 % json.dumps(meta["keyUrl"]), await_p=True, to=30)
|
||||
key = base64.b64decode(kb)
|
||||
os.makedirs("/tmp/grab", exist_ok=True)
|
||||
ts_path = f"/tmp/grab/{name}.ts"
|
||||
t0 = time.time()
|
||||
with open(ts_path, "wb") as f:
|
||||
for i, su in enumerate(segs):
|
||||
b64 = cdp.ev(JS_FETCH_B64 % json.dumps(su), await_p=True, to=90)
|
||||
if isinstance(b64, str) and b64.startswith("ERR"):
|
||||
raise RuntimeError(f"seg {i}: {b64}")
|
||||
data = base64.b64decode(b64)
|
||||
if key:
|
||||
if meta["ivHex"]:
|
||||
iv = bytes.fromhex(meta["ivHex"])
|
||||
else:
|
||||
iv = (seq + i).to_bytes(16, "big")
|
||||
dec = AES.new(key, AES.MODE_CBC, iv).decrypt(data)
|
||||
try: dec = unpad(dec, 16)
|
||||
except ValueError: pass # ultimo segmento sin padding exacto
|
||||
data = dec
|
||||
f.write(data)
|
||||
if i % 50 == 0:
|
||||
el = time.time() - t0
|
||||
print(f"[grab] {i}/{len(segs)} ({el:.0f}s)", flush=True)
|
||||
cdp.close()
|
||||
sz = os.path.getsize(ts_path)
|
||||
print(f"[grab] .ts listo: {sz/1e9:.2f} GB en {time.time()-t0:.0f}s", flush=True)
|
||||
# remux -> mkv directo a manual/movies
|
||||
os.makedirs(DEST, exist_ok=True)
|
||||
mkv = os.path.join(DEST, f"{name}.mkv")
|
||||
r = subprocess.run(["ffmpeg", "-y", "-fflags", "+genpts", "-i", ts_path,
|
||||
"-map", "0:v:0", "-map", "0:a:0", "-c", "copy", mkv],
|
||||
capture_output=True, text=True)
|
||||
if r.returncode != 0:
|
||||
print(json.dumps({"status": "error", "error": "ffmpeg: " + r.stderr[-300:]})); return
|
||||
os.remove(ts_path)
|
||||
print(json.dumps({"status": "ok", "mkv": mkv, "size_gb": round(os.path.getsize(mkv)/1e9, 2),
|
||||
"segments": len(segs)}))
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user