feat(recon): modo CDP en fingerprint_web_stack para detectar SPAs

Añade fetch_http_fingerprint_cdp_py_browser (domain browser): recoge el HTML renderizado tras ejecutar JavaScript usando un Chrome remoto via CDP, componiendo cdp_open_url_and_wait + cdp_eval. Devuelve la misma estructura que el fetch estático para que detect_web_tech lo consuma sin cambios. Integra use_cdp en el pipeline fingerprint_web_stack (v1.1.0): combina los headers reales del fetch estático con el HTML post-JS del CDP. Detecta frameworks de SPA (React/Vue/Angular/Next) que el fetch estático no ve porque montan el DOM en runtime. Si no hay Chrome en cdp_port, degrada al fetch estático con un warning (no rompe). cdp_port=9333 (Chrome aislado) recomendado para terceros, 9222 diario. Verificado en vivo (Chrome 9333): sobre una SPA cuyo marcador de framework solo aparece tras ejecutar JS, el estático detecta solo nginx; con use_cdp=True detecta además Next.js, React y Node.js. Tests: 48 verdes (error path sin Chrome + happy path mockeado + degradación). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 15:31:28 +02:00
parent 935008ec3f
commit 1430039688
7 changed files with 649 additions and 40 deletions
@@ -0,0 +1,84 @@
+---
+name: fetch_http_fingerprint_cdp
+kind: function
+lang: py
+domain: browser
+version: "1.0.0"
+purity: impure
+signature: "def fetch_http_fingerprint_cdp(url: str, *, port: int = 9222, wait_render_s: float = 2.0, timeout_s: float = 30.0, close_tab: bool = True) -> dict"
+description: "Fingerprint web con HTML RENDERIZADO tras ejecutar JavaScript via Chrome DevTools Protocol (CDP). Navega con un Chrome remoto, espera a que la SPA monte el DOM y recoge el HTML post-JS, titulo, URL final y nombres de cookie. Detecta frameworks que el fetch estatico NO ve: React, Vue, Angular, Next, Svelte montados en runtime. Wappalyzer dinamico: devuelve la MISMA estructura que fetch_http_fingerprint para que detect_web_tech la consuma sin cambios. Recon web de SPAs / single-page applications con HTML inicial vacio."
+tags: [recon, web-recon, browser, cdp, fingerprint, spa, wappalyzer, javascript, react, vue, angular]
+uses_functions: ["cdp_open_url_and_wait_py_pipelines", "cdp_eval_py_browser"]
+uses_types: []
+returns: []
+returns_optional: false
+error_type: "error_py_core"
+imports: []
+params:
+  - name: url
+    desc: "URL objetivo del fingerprint (sitio a inspeccionar)."
+  - name: port
+    desc: "Puerto de remote debugging del Chrome a usar. Default 9222 (navegador diario, activado global). Para aislamiento de recon de terceros, apuntar a 9333 (Chrome aislado del browser_mcp)."
+  - name: wait_render_s
+    desc: "Segundos extra de espera tras el load event para que el JS de la SPA pinte el DOM (el load NO garantiza render completo). Default 2.0."
+  - name: timeout_s
+    desc: "Timeout de la navegacion en segundos. Default 30.0."
+  - name: close_tab
+    desc: "Si True, cierra el tab al terminar (best-effort via window.close()) para no dejar pestanas abiertas. Default True."
+output: "dict siempre (nunca lanza). En exito: {status:'ok', url, final_url, title, status_code:None, headers:{}, cookies:[solo nombres no-httponly], html:<RENDERIZADO post-JS>, html_len, rendered:True, raw}. En error: {status:'error', error:<mensaje claro>, url}. status_code/headers quedan vacios porque CDP no expone la capa de red; esta funcion aporta el HTML renderizado, que es lo que detect_web_tech necesita para una SPA."
+tested: true
+tests: ["test_sin_chrome_devuelve_error_sin_lanzar", "test_url_vacia_devuelve_error", "test_happy_path_monkeypatch", "test_happy_path_eval_falla_devuelve_error"]
+test_file_path: "python/functions/browser/fetch_http_fingerprint_cdp_test.py"
+file_path: "python/functions/browser/fetch_http_fingerprint_cdp.py"
+---
+
+## Ejemplo
+
+```python
+import sys, os, json
+sys.path.insert(0, os.path.join("python", "functions"))
+from browser.fetch_http_fingerprint_cdp import fetch_http_fingerprint_cdp
+from cybersecurity.detect_web_tech import detect_web_tech
+
+# Recoge el HTML RENDERIZADO (post-JS) de una SPA via el Chrome diario (9222).
+res = fetch_http_fingerprint_cdp("https://react.dev/", port=9222)
+if res["status"] == "ok":
+    # detect_web_tech (PURA) consume las mismas senales que fetch_http_fingerprint.
+    tech = detect_web_tech(
+        res["headers"],            # {} con CDP — usa el fetch estatico para headers
+        html=res["html"],          # el HTML RENDERIZADO post-JS: aqui esta la clave
+        cookies=res["cookies"],    # solo nombres
+        final_url=res["final_url"],
+    )
+    print(json.dumps(tech, ensure_ascii=False, indent=2))
+else:
+    print("error:", res["error"])
+```
+
+## Cuando usarla
+
+Cuando el fetch estatico (`fetch_http_fingerprint`) NO detecta el framework porque
+el sitio es una SPA que monta el DOM con JavaScript (HTML inicial casi vacio:
+`<div id="root">` o `<div id="__next">` sin contenido). Esta funcion recoge el HTML
+DESPUES de que el JS pinte, de modo que `detect_web_tech` ve React / Vue / Angular /
+Next igual que un Wappalyzer dinamico. Requiere un Chrome con remote debugging.
+Combina ambas capas para fingerprint completo: estatico para headers + status +
+cookies httponly; CDP para el HTML renderizado.
+
+## Gotchas
+
+- **Requiere un Chrome con remote debugging** escuchando en `port`: 9222 (navegador
+  diario, ya activado global) o 9333 (Chrome aislado del browser_mcp). Sin Chrome
+  vivo devuelve `{status:"error", error:"no hay Chrome en el puerto N (¿remote debugging activo?)"}` — no lanza.
+- **Abre un tab en ESE navegador.** Con `port=9222` mezcla la sesion de tu navegador
+  PERSONAL (cookies de tu sesion, historial). Para recon de TERCEROS prefiere
+  `port=9333` (aislado) para no contaminar ni filtrar tu sesion.
+- **`document.cookie` NO ve cookies httponly** (las de sesion casi siempre lo son):
+  esas y los headers de respuesta vienen mejor del fetch estatico `fetch_http_fingerprint`.
+- **`headers` y `status_code` quedan vacios/None**: CDP no expone la capa de red sin
+  el dominio Network. Esta funcion aporta el HTML renderizado, no la red. Si necesitas
+  el status real o headers, usa el fetch estatico en paralelo.
+- **`wait_render_s` puede ser insuficiente** para SPAs lentas (mucho data-fetching tras
+  el load). Si el `html` sale incompleto, sube `wait_render_s` (ej. 4.0-6.0).
+- **Respeta scope y autorizacion legal**: solo inspecciona sitios que tengas permiso
+  para analizar.
@@ -0,0 +1,206 @@
+"""Fingerprint web con HTML RENDERIZADO (post-JS) via Chrome DevTools Protocol.
+
+Funcion IMPURA: usa un Chrome con remote debugging para navegar a una URL,
+esperar a que el JavaScript de la pagina monte el DOM, y recoger el HTML ya
+renderizado (mas titulo, URL final y nombres de cookie). Devuelve la MISMA
+estructura que `fetch_http_fingerprint_py_cybersecurity` para que el matcher de
+firmas `detect_web_tech_py_cybersecurity` la consuma SIN cambios.
+
+Por que existe: el fetch estatico (`fetch_http_fingerprint`) hace un GET con
+urllib y NO ejecuta JavaScript. Una SPA (React/Vue/Angular/Next con HTML inicial
+casi vacio) monta su framework en runtime, asi que el estatico no ve el stack.
+Esta funcion recoge el HTML DESPUES de que el JS pinte, de modo que el matcher
+detecta el framework igual que un Wappalyzer dinamico.
+
+Compone DOS funciones del registry (no reescribe transporte CDP):
+  1. `cdp_open_url_and_wait` (pipeline) — crea tab nuevo en Chrome remoto, navega
+     y espera `Page.loadEventFired`. Devuelve el tab_id.
+  2. `cdp_eval` (browser) — evalua JS en la pestana cuyo URL contiene un substring.
+
+SEGURIDAD: en `cookies` solo se guardan los NOMBRES, jamas los valores (son
+tokens de sesion sensibles). `document.cookie` ademas NO ve cookies httponly:
+esas (y los headers de respuesta) vienen mejor del fetch estatico.
+
+Devuelve SIEMPRE un dict (estilo del grupo recon): nunca lanza excepciones.
+"""
+
+import json
+import os
+import sys
+import time
+import urllib.parse
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
+
+from browser.cdp_eval import cdp_eval
+from pipelines.cdp_open_url_and_wait import cdp_open_url_and_wait
+
+
+def _cookie_names(cookie_str: str) -> list[str]:
+    """Extrae SOLO los nombres de cookie de un `document.cookie` (nunca valores).
+
+    `document.cookie` viene como ``"a=1; b=2; c=3"``. Partimos por ';' y nos
+    quedamos con lo anterior al primer '=' de cada par. Deduplica en orden.
+    """
+    out: list[str] = []
+    seen: set[str] = set()
+    for pair in (cookie_str or "").split(";"):
+        pair = pair.strip()
+        if not pair:
+            continue
+        name = pair.split("=", 1)[0].strip()
+        if name and name not in seen:
+            seen.add(name)
+            out.append(name)
+    return out
+
+
+def fetch_http_fingerprint_cdp(
+    url: str,
+    *,
+    port: int = 9222,
+    wait_render_s: float = 2.0,
+    timeout_s: float = 30.0,
+    close_tab: bool = True,
+) -> dict:
+    """Recoge el HTML renderizado (post-JS) de una URL via CDP para fingerprinting.
+
+    Funcion IMPURA: necesita un Chrome con remote debugging escuchando en `port`.
+    Navega con `cdp_open_url_and_wait`, espera `wait_render_s` para que la SPA
+    pinte el DOM, y recoge senales con `cdp_eval`. Nunca lanza: cualquier fallo
+    (Chrome no responde, tab no abre, eval con error) devuelve
+    ``{"status": "error", ...}``.
+
+    La estructura de salida es COMPATIBLE con `fetch_http_fingerprint` y
+    `detect_web_tech`: `status_code` y `headers` quedan a None/vacios (CDP no
+    expone la capa de red sin el dominio Network); esta funcion aporta el `html`
+    RENDERIZADO, que es justo lo que el matcher de firmas necesita para una SPA.
+
+    Args:
+        url: URL objetivo del fingerprint.
+        port: Puerto de remote debugging del Chrome a usar. Default 9222
+            (navegador diario, ya activado global). Para AISLAMIENTO (recon de
+            terceros sin mezclar tu sesion personal) apunta a 9333 (el Chrome
+            aislado del browser_mcp).
+        wait_render_s: Segundos extra de espera tras el load para que el JS de la
+            SPA pinte el DOM (el load event NO garantiza render completo). Default 2.0.
+        timeout_s: Timeout de la navegacion en segundos. Default 30.0.
+        close_tab: Si True, cierra el tab al terminar (best-effort via
+            `window.close()`) para no dejar pestanas abiertas en el navegador.
+            Default True.
+
+    Returns:
+        dict. En exito::
+
+            {
+                "status": "ok",
+                "url": <url solicitada>,
+                "final_url": <location.href tras redirects client-side>,
+                "title": <document.title o None>,
+                "status_code": None,   # CDP no expone el status del documento principal
+                "headers": {},         # CDP no expone response headers sin Network domain
+                "cookies": [<nombres de cookie no-httponly>],
+                "html": <HTML renderizado (post-JS)>,
+                "html_len": <len del html>,
+                "rendered": True,      # marca que el html es post-JS
+                "raw": <bloque legible de evidencia>,
+            }
+
+        En error::
+
+            {"status": "error", "error": <mensaje claro>, "url": <url>}
+    """
+    if not url or not url.strip():
+        return {"status": "error", "error": "fetch_http_fingerprint_cdp: url vacia", "url": url}
+
+    # Substring para elegir el target correcto en cdp_eval. El hostname es el
+    # fragmento mas estable de la URL (sobrevive a query strings y fragments).
+    try:
+        substr = urllib.parse.urlparse(url).hostname or url
+    except Exception:  # noqa: BLE001 — URL malformada, caer al url completo
+        substr = url
+
+    # 1. Navegar: crea tab nuevo en el Chrome remoto y espera el load event.
+    try:
+        cdp_open_url_and_wait(port, url, int(timeout_s))
+    except Exception as e:  # noqa: BLE001 — RuntimeError de cdp_open_url_and_wait
+        msg = str(e)
+        # Mensaje claro para el caso mas comun: no hay Chrome escuchando.
+        if "no se pudo crear tab" in msg or "URLError" in msg or "Connection refused" in msg:
+            msg = f"no hay Chrome en el puerto {port} (¿remote debugging activo?): {e}"
+        return {"status": "error", "error": f"fetch_http_fingerprint_cdp: {msg}", "url": url}
+
+    # 2. Esperar el render del JS (el load event no garantiza DOM pintado en SPAs).
+    if wait_render_s > 0:
+        time.sleep(wait_render_s)
+
+    # 3. Recoger senales con un solo eval (un objeto JSON con todo).
+    expr = (
+        "JSON.stringify({"
+        "html: document.documentElement.outerHTML,"
+        "title: document.title,"
+        "href: location.href,"
+        "cookie: document.cookie"
+        "})"
+    )
+    r = cdp_eval(expr, port=port, target_url_substr=substr, timeout_s=max(10.0, timeout_s))
+
+    # 4. (best-effort) cerrar el tab para no dejar basura en el navegador.
+    if close_tab:
+        try:
+            cdp_eval("window.close()", port=port, target_url_substr=substr, timeout_s=5.0)
+        except Exception:  # noqa: BLE001 — cierre best-effort, no afecta al resultado
+            pass
+
+    if not r.get("ok"):
+        err = r.get("error") or "eval CDP fallo sin mensaje"
+        return {
+            "status": "error",
+            "error": f"fetch_http_fingerprint_cdp: no se pudo evaluar JS ({err})",
+            "url": url,
+        }
+
+    raw_value = r.get("value")
+    try:
+        data = json.loads(raw_value) if isinstance(raw_value, str) else (raw_value or {})
+    except Exception:  # noqa: BLE001 — JSON malformado del eval
+        return {
+            "status": "error",
+            "error": "fetch_http_fingerprint_cdp: respuesta del eval no es JSON valido",
+            "url": url,
+        }
+
+    html = data.get("html") or ""
+    title = data.get("title") or None
+    final_url = data.get("href") or r.get("target_url") or url
+    cookies = _cookie_names(data.get("cookie") or "")
+
+    raw = (
+        f"CDP fingerprint {url}\n"
+        f"final_url: {final_url}\n"
+        f"title: {title}\n"
+        f"html_len: {len(html)}"
+    )
+
+    return {
+        "status": "ok",
+        "url": url,
+        "final_url": final_url,
+        "title": title,
+        "status_code": None,
+        "headers": {},
+        "cookies": cookies,
+        "html": html,
+        "html_len": len(html),
+        "rendered": True,
+        "raw": raw,
+    }
+
+
+if __name__ == "__main__":
+    target = sys.argv[1] if len(sys.argv) > 1 else "https://react.dev/"
+    dbg_port = int(sys.argv[2]) if len(sys.argv) > 2 else 9222
+    out = fetch_http_fingerprint_cdp(target, port=dbg_port)
+    # No volcar el html entero por stdout: solo el resumen.
+    summary = {k: v for k, v in out.items() if k != "html"}
+    print(json.dumps(summary, ensure_ascii=False, indent=2))
@@ -0,0 +1,88 @@
+"""Tests para fetch_http_fingerprint_cdp.
+
+No hay Chrome en el entorno de test/CI. Se cubren:
+  - El error path REAL: sin Chrome escuchando -> {status:"error"} sin lanzar.
+  - El happy path por composicion: monkeypatch de cdp_open_url_and_wait + cdp_eval
+    para validar la orquestacion (estructura, html renderizado, cookies solo nombres)
+    sin Chrome real.
+"""
+
+import os
+import sys
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
+
+import browser.fetch_http_fingerprint_cdp as mod
+from browser.fetch_http_fingerprint_cdp import fetch_http_fingerprint_cdp
+
+
+def test_sin_chrome_devuelve_error_sin_lanzar():
+    # Puerto donde no hay Chrome -> degradacion limpia, nunca excepcion.
+    res = fetch_http_fingerprint_cdp("http://127.0.0.1:1/", port=1, timeout_s=2)
+    assert res["status"] == "error"
+    assert "error" in res and res["error"]
+    assert res["url"] == "http://127.0.0.1:1/"
+
+
+def test_url_vacia_devuelve_error():
+    res = fetch_http_fingerprint_cdp("   ")
+    assert res["status"] == "error"
+    assert "url vacia" in res["error"]
+
+
+def test_happy_path_monkeypatch(monkeypatch):
+    # Fake del pipeline: devuelve un tab_id sin tocar red.
+    def fake_open(debug_port, url, timeout_s=30):
+        assert url == "https://some-spa.com/"
+        return "TAB123"
+
+    # Fake del eval: primera llamada (recoger senales) devuelve el JSON de la SPA;
+    # llamadas posteriores (window.close) devuelven ok vacio.
+    calls = {"n": 0}
+    spa_html = '<html><body><div id="__next">hi</div></body></html>'
+
+    def fake_eval(expression, *, port=9222, target_url_substr="", await_promise=False, timeout_s=10.0):
+        calls["n"] += 1
+        if "outerHTML" in expression:
+            import json as _json
+            payload = _json.dumps({
+                "html": spa_html,
+                "title": "Some SPA",
+                "href": "https://some-spa.com/home",
+                "cookie": "session=SECRETVALUE; theme=dark",
+            })
+            return {"ok": True, "value": payload, "error": "", "target_url": "https://some-spa.com/"}
+        return {"ok": True, "value": None, "error": "", "target_url": "https://some-spa.com/"}
+
+    monkeypatch.setattr(mod, "cdp_open_url_and_wait", fake_open)
+    monkeypatch.setattr(mod, "cdp_eval", fake_eval)
+
+    res = fetch_http_fingerprint_cdp("https://some-spa.com/", port=9222, wait_render_s=0)
+
+    assert res["status"] == "ok"
+    assert res["rendered"] is True
+    assert '<div id="__next">' in res["html"]
+    assert res["html_len"] == len(spa_html)
+    assert res["title"] == "Some SPA"
+    assert res["final_url"] == "https://some-spa.com/home"
+    # Cookies: SOLO nombres, jamas valores.
+    assert res["cookies"] == ["session", "theme"]
+    assert "SECRETVALUE" not in str(res["cookies"])
+    # Compatibilidad con detect_web_tech: status_code None, headers vacio.
+    assert res["status_code"] is None
+    assert res["headers"] == {}
+
+
+def test_happy_path_eval_falla_devuelve_error(monkeypatch):
+    def fake_open(debug_port, url, timeout_s=30):
+        return "TAB123"
+
+    def fake_eval(expression, *, port=9222, target_url_substr="", await_promise=False, timeout_s=10.0):
+        return {"ok": False, "value": None, "error": "boom", "target_url": ""}
+
+    monkeypatch.setattr(mod, "cdp_open_url_and_wait", fake_open)
+    monkeypatch.setattr(mod, "cdp_eval", fake_eval)
+
+    res = fetch_http_fingerprint_cdp("https://x.com/", wait_render_s=0)
+    assert res["status"] == "error"
+    assert "boom" in res["error"]
@@ -3,15 +3,16 @@ name: fingerprint_web_stack
 kind: pipeline
 lang: py
 domain: pipelines
-version: "1.0.0"
+version: "1.1.0"
 purity: impure
-signature: "def fingerprint_web_stack(url: str, timeout_s: float = 15.0, verify_tls: bool = True, max_html_bytes: int = 500_000, save: bool = True) -> dict"
-description: "One-shot que detecta la tecnologia web (stack tecnologico estilo Wappalyzer) de una URL: hace el fetch HTTP de las senales (fetch_http_fingerprint) y matchea las firmas (detect_web_tech), devolviendo las tecnologias detectadas — servidor, lenguaje, CMS, framework web, frameworks JS, librerias, analytics, CDN, e-commerce, WAF — con categoria, version y confidence. Reemplaza el patron fetch_http_fingerprint -> detect_web_tech por una sola llamada. El equivalente registry de Wappalyzer / whatweb / un fingerprint de stack de una url. Opcionalmente archiva la evidencia (tabla TECNOLOGIA/CATEGORIA/VERSION/CONFIDENCE) en OSINT. Util para reconocimiento web, auditoria de superficie y averiguar que CMS framework servidor usa un sitio."
-tags: [recon, web-recon, pipelines, cybersecurity, fingerprint, wappalyzer, web-tech, sink]
+signature: "def fingerprint_web_stack(url: str, timeout_s: float = 15.0, verify_tls: bool = True, max_html_bytes: int = 500_000, save: bool = True, use_cdp: bool = False, cdp_port: int = 9222, wait_render_s: float = 2.0) -> dict"
+description: "One-shot que detecta la tecnologia web (stack tecnologico estilo Wappalyzer) de una URL: hace el fetch HTTP de las senales (fetch_http_fingerprint) y matchea las firmas (detect_web_tech), devolviendo las tecnologias detectadas — servidor, lenguaje, CMS, framework web, frameworks JS, librerias, analytics, CDN, e-commerce, WAF — con categoria, version y confidence. Reemplaza el patron fetch_http_fingerprint -> detect_web_tech por una sola llamada. El equivalente registry de Wappalyzer / whatweb / un fingerprint de stack de una url. Con use_cdp=True ademas analiza el HTML RENDERIZADO tras ejecutar JavaScript (fetch_http_fingerprint_cdp via Chrome remoto) para detectar SPAs (React/Vue/Angular/Next) que el fetch estatico no ve; si no hay Chrome degrada a estatico con un warning. Opcionalmente archiva la evidencia (tabla TECNOLOGIA/CATEGORIA/VERSION/CONFIDENCE) en OSINT. Util para reconocimiento web, auditoria de superficie y averiguar que CMS framework servidor usa un sitio, incluidas single-page applications."
+tags: [recon, web-recon, pipelines, cybersecurity, fingerprint, wappalyzer, web-tech, sink, cdp, spa, render]
 uses_functions:
  - fetch_http_fingerprint_py_cybersecurity
  - detect_web_tech_py_cybersecurity
  - save_scan_to_osint_py_cybersecurity
+  - fetch_http_fingerprint_cdp_py_browser
 uses_types: []
 returns: []
 returns_optional: false
@@ -28,9 +29,15 @@ params:
    desc: "Corta el HTML leido a este tamano para no descargar megas. Default 500_000 (500 KB). Se pasa a fetch_http_fingerprint."
  - name: save
    desc: "Si True (default) archiva la evidencia en OSINT via save_scan_to_osint con scan_type='web_tech' (target = host de la URL); si False solo ejecuta el fetch + matching y no toca el vault ni el service osint_db. Politica recon: todo scan se archiva. Si el sink falla, el resultado degrada sin romper (saved.status='error')."
-output: "dict con status ('ok'|'error'), url, final_url (tras redirects), status_code (int), server (cabecera Server o ''), title (titulo de la pagina o ''), technologies (lista de dicts con name, category, version, confidence, evidence — tal cual de detect_web_tech), by_category (dict categoria -> lista de nombres), count (int), saved (dict de save_scan_to_osint con note_path/registered/scan_id, o None si save=False) y raw (tabla legible TECNOLOGIA/CATEGORIA/VERSION/CONFIDENCE con cabecera de url/status/server/title). Si el fetch HTTP falla (host no resuelve, conexion rechazada, timeout) -> {status:error, stage:fetch, url:..., fetch:<dict>}. Nunca lanza."
+  - name: use_cdp
+    desc: "Si True, ademas del fetch estatico hace un fetch via Chrome DevTools Protocol (fetch_http_fingerprint_cdp) para analizar el HTML RENDERIZADO tras ejecutar JavaScript y detectar SPAs (React/Vue/Angular/Next) que el HTML inicial vacio no revela. Requiere un Chrome con remote debugging en cdp_port. Si el CDP no esta disponible, DEGRADA al HTML estatico con un warning (no falla). Default False (comportamiento estatico clasico, sin regresion)."
+  - name: cdp_port
+    desc: "Puerto de remote debugging del Chrome a usar cuando use_cdp=True. Default 9222 (navegador diario, activado global — mezcla tu sesion personal). Para recon de terceros sin contaminar tu sesion, usar 9333 (Chrome aislado del browser_mcp)."
+  - name: wait_render_s
+    desc: "Segundos de espera tras el load event para que la SPA pinte el DOM (solo aplica con use_cdp=True). Default 2.0. Subir (4.0-6.0) para SPAs lentas con mucho data-fetching; un valor corto puede dejar el HTML incompleto."
+output: "dict con status ('ok'|'error'), url, final_url (tras redirects), status_code (int), server (cabecera Server o ''), title (titulo de la pagina o ''), technologies (lista de dicts con name, category, version, confidence, evidence — tal cual de detect_web_tech), by_category (dict categoria -> lista de nombres), count (int), html_source ('static'|'cdp' — fuente del HTML analizado), rendered (bool, True si html_source=='cdp'), warnings (lista de avisos, p.ej. degradacion CDP->estatico; vacia si no hubo), saved (dict de save_scan_to_osint con note_path/registered/scan_id, o None si save=False) y raw (tabla legible TECNOLOGIA/CATEGORIA/VERSION/CONFIDENCE con cabecera de url/status/server/title/html_source). Si el fetch HTTP estatico falla y use_cdp=False (o ambos fallan) -> {status:error, stage:fetch, url:..., fetch:<dict>}. Nunca lanza."
 tested: true
-tests: ["test_golden_fingerprint_servidor_local_wordpress_nginx", "test_save_false_no_archiva_osint", "test_fetch_fallido_propaga_error_sin_red"]
+tests: ["test_golden_fingerprint_servidor_local_wordpress_nginx", "test_save_false_no_archiva_osint", "test_fetch_fallido_propaga_error_sin_red", "test_use_cdp_sin_chrome_degrada_a_estatico"]
 test_file_path: "python/functions/pipelines/fingerprint_web_stack_test.py"
 file_path: "python/functions/pipelines/fingerprint_web_stack.py"
 ---
@@ -56,6 +63,20 @@ print(r["by_category"])  # {"cms": ["WordPress"], "web-server": ["nginx"], ...}
 ```python
 from pipelines.fingerprint_web_stack import fingerprint_web_stack

+# Modo CDP: analiza el HTML RENDERIZADO tras el JS (detecta SPAs React/Vue/Angular).
+# Requiere Chrome con remote debugging en cdp_port (9222 diario / 9333 aislado).
+r = fingerprint_web_stack("https://react.dev/", use_cdp=True, cdp_port=9222, save=False)
+print(r["status"])        # "ok"
+print(r["html_source"])   # "cdp" si habia Chrome; "static" (con warning) si no
+print(r["rendered"])      # True si se uso el HTML renderizado
+print(r["warnings"])      # [] si CDP ok; ["cdp no disponible: ...; usando fetch estatico"] si degrado
+for t in r["technologies"]:
+    print(t["name"], t["category"])  # React javascript-framework, etc.
+```
+
+```python
+from pipelines.fingerprint_web_stack import fingerprint_web_stack
+
 # Con archivado en OSINT (default): deja una nota en el vault + POST al osint_db.
 r = fingerprint_web_stack("https://midominio.example")
 print(r["saved"]["note_path"])  # dominios/midominio.example/recon/web_tech-....md
@@ -66,6 +87,8 @@ print(r["saved"]["note_path"])  # dominios/midominio.example/recon/web_tech-....
 ./fn run fingerprint_web_stack https://example.com
 # Flags: --no-save (no archiva OSINT), --no-verify-tls (cert self-signed, inseguro).
 ./fn run fingerprint_web_stack https://example.com --no-save
+# Modo CDP (HTML renderizado tras JS): --cdp [--cdp-port 9333].
+./fn run fingerprint_web_stack https://react.dev/ --cdp --no-save
 ```

 ## Cuando usarla
@@ -80,11 +103,26 @@ enriquecer una investigacion OSINT con el stack de un host.

 ## Gotchas

- **Fetch estatico: NO ejecuta JavaScript.** Solo ve el HTML inicial que devuelve
-  el servidor. Las SPAs que montan el framework (React/Vue/Angular/Svelte) en
-  runtime suelen servir un HTML casi vacio, asi que esos frameworks pueden NO
-  detectarse. Para sitios JS-pesados, un fingerprint con navegador real (CDP)
-  veria mas; este pipeline es la version sin navegador.
+- **Fetch estatico (use_cdp=False): NO ejecuta JavaScript.** Solo ve el HTML
+  inicial que devuelve el servidor. Las SPAs que montan el framework
+  (React/Vue/Angular/Svelte) en runtime suelen servir un HTML casi vacio, asi que
+  esos frameworks pueden NO detectarse. Para sitios JS-pesados usa `use_cdp=True`
+  (analiza el HTML renderizado tras el JS via Chrome remoto).
+- **`use_cdp=True` requiere Chrome con remote debugging** escuchando en `cdp_port`:
+  9222 (navegador diario, activado global) o 9333 (Chrome aislado del browser_mcp).
+  Si no hay Chrome, el pipeline NO falla: DEGRADA al HTML estatico, marca
+  `html_source="static"` y rellena `warnings` con `"cdp no disponible: ...; usando
+  fetch estatico"`. Comprueba siempre `result["warnings"]` para saber si el CDP se
+  aplico o si caiste al estatico.
+- **Con `cdp_port=9222` se abre un tab en tu navegador PERSONAL** (mezcla cookies e
+  historial de tu sesion diaria). Para fingerprint de TERCEROS sin contaminar ni
+  filtrar tu sesion, usa `cdp_port=9333` (el Chrome aislado del browser_mcp).
+- **`wait_render_s` puede ser corto para SPAs lentas**: el load event NO garantiza
+  el DOM pintado. Si el `html` renderizado sale incompleto (faltan frameworks que
+  deberian aparecer), sube `wait_render_s` a 4.0-6.0.
+- **CDP no expone headers ni status_code**: con `use_cdp=True`, `server`,
+  `status_code` y `headers` siguen viniendo del fetch estatico (que siempre se
+  ejecuta); el CDP solo aporta el `html` renderizado y los nombres de cookie no-httponly.
 - **La tabla de firmas es un subconjunto de Wappalyzer**, no exhaustiva. Un
  tecnologia no listada en `detect_web_tech` no aparecera aunque este presente.
  Para ampliar cobertura, anade entradas a `SIGNATURES` en `detect_web_tech`.
@@ -119,3 +157,7 @@ entero ni valores de cookie (las cookies de `fetch_http_fingerprint` ya son solo
 nombres). El `target` para el archivado OSINT se deriva del host de la URL
 (`urllib.parse.urlparse(...).hostname`). Nunca lanza excepciones: todo fallo se
 refleja en la clave `status` del dict devuelto.
+
+## Capability growth log
+
+- v1.1.0 (2026-06-14) — anade modo use_cdp: usa fetch_http_fingerprint_cdp para analizar el HTML renderizado tras JS y detectar SPAs (React/Vue/Angular) que el fetch estatico no ve; degrada a estatico si no hay Chrome.
@@ -4,15 +4,18 @@ One-shot que materializa el flujo "averiguar la tecnologia web (stack) de una
 URL" estilo Wappalyzer: hace el fetch HTTP de las senales (cabeceras, HTML,
 cookies, titulo, servidor) y matchea las firmas para devolver las tecnologias
 detectadas (servidor, lenguaje, CMS, frameworks JS, librerias, analytics, CDN,
-e-commerce, WAF). Opcionalmente archiva la evidencia en OSINT.
+e-commerce, WAF). Con use_cdp=True, ademas analiza el HTML RENDERIZADO tras
+ejecutar JavaScript (via Chrome remoto) para detectar SPAs (React/Vue/Angular)
+que el fetch estatico no ve. Opcionalmente archiva la evidencia en OSINT.

 Convierte el patron de 2 llamadas (fetch_http_fingerprint -> detect_web_tech)
 en una sola invocacion. Compone funciones del registry del dominio
-cybersecurity; no reescribe ninguna logica de fetch, matching de firmas ni
-persistencia.
+cybersecurity (y browser para el modo CDP); no reescribe ninguna logica de
+fetch, render, matching de firmas ni persistencia.

 Funciones del registry compuestas (importadas, no reimplementadas):
-    fetch_http_fingerprint, detect_web_tech, save_scan_to_osint
+    fetch_http_fingerprint, detect_web_tech, save_scan_to_osint,
+    fetch_http_fingerprint_cdp
 """

 from urllib.parse import urlparse
@@ -22,6 +25,7 @@ from cybersecurity import (
    detect_web_tech,
    save_scan_to_osint,
 )
+from browser.fetch_http_fingerprint_cdp import fetch_http_fingerprint_cdp


 def _build_raw(
@@ -31,6 +35,7 @@ def _build_raw(
    server: str,
    title: str,
    technologies: list[dict],
+    html_source: str = "static",
 ) -> str:
    """Construye una tabla legible TECNOLOGIA/CATEGORIA/VERSION/CONFIDENCE para evidencia.

@@ -44,10 +49,13 @@ def _build_raw(
        server: cadena del servidor (cabecera Server), puede ser "".
        title: titulo de la pagina, puede ser "".
        technologies: lista de dicts de tecnologia (ver fingerprint_web_stack).
+        html_source: fuente del HTML analizado ("static" = fetch estatico,
+            "cdp" = HTML renderizado post-JS via Chrome). Default "static".

    Returns:
        Bloque de texto multi-linea con cabecera y una fila por tecnologia.
    """
+    html_label = "cdp-rendered (post-JS)" if html_source == "cdp" else "static (sin JS)"
    header_lines = [
        f"# fingerprint_web_stack {url}",
        "",
@@ -56,6 +64,7 @@ def _build_raw(
        f"status_code: {status_code}",
        f"server:      {server or '-'}",
        f"title:       {title or '-'}",
+        f"html_source: {html_label}",
        "",
    ]
    cols = f"{'TECHNOLOGY':<24}{'CATEGORY':<22}{'VERSION':<14}CONFIDENCE"
@@ -96,22 +105,53 @@ def _target_from_url(url: str, final_url: str) -> str:
    return (url or final_url or "unknown").strip()


+def _union_cookie_names(static_cookies, cdp_cookies) -> list[str]:
+    """Une los nombres de cookie de ambas fuentes (estatico + CDP), deduplicando.
+
+    Preserva el orden: primero los del fetch estatico (incluye httponly que CDP
+    no ve), luego los exclusivos del CDP. Solo nombres, nunca valores.
+
+    Args:
+        static_cookies: lista de nombres de cookie del fetch estatico.
+        cdp_cookies: lista de nombres de cookie del fetch CDP (document.cookie).
+
+    Returns:
+        Lista de nombres unicos en orden estable.
+    """
+    out: list[str] = []
+    seen: set[str] = set()
+    for name in list(static_cookies or []) + list(cdp_cookies or []):
+        if name and name not in seen:
+            seen.add(name)
+            out.append(name)
+    return out
+
+
 def fingerprint_web_stack(
    url: str,
    timeout_s: float = 15.0,
    verify_tls: bool = True,
    max_html_bytes: int = 500_000,
    save: bool = True,
+    use_cdp: bool = False,
+    cdp_port: int = 9222,
+    wait_render_s: float = 2.0,
 ) -> dict:
    """Detecta la tecnologia web (stack) de una URL en un solo paso (estilo Wappalyzer).

    Compone, en una sola invocacion:
      1. ``fetch_http_fingerprint(url, ...)`` para recoger las senales crudas de
-         la respuesta (cabeceras, HTML, cookies, titulo, servidor).
-      2. ``detect_web_tech(headers, html, cookies, final_url)`` (PURA) para
-         matchear esas senales contra la tabla de firmas y obtener las
-         tecnologias detectadas.
-      3. Si ``save`` es True, archiva una tabla de evidencia en OSINT via
+         la respuesta (cabeceras, HTML inicial sin JS, cookies, titulo, servidor).
+         Aporta headers/server/status_code reales que CDP no expone.
+      2. Si ``use_cdp`` es True, ``fetch_http_fingerprint_cdp(url, ...)`` para
+         obtener el HTML RENDERIZADO tras ejecutar JavaScript (via Chrome remoto):
+         asi se detectan SPAs (React/Vue/Angular/Next) con HTML inicial vacio que
+         el fetch estatico pierde. Si el CDP falla (sin Chrome, etc.) DEGRADA al
+         HTML estatico sin romper y deja un warning.
+      3. ``detect_web_tech(headers, html, cookies, final_url)`` (PURA) para
+         matchear esas senales contra la tabla de firmas. El HTML analizado es el
+         del CDP cuando esta disponible, si no el del estatico.
+      4. Si ``save`` es True, archiva una tabla de evidencia en OSINT via
         ``save_scan_to_osint`` con ``scan_type="web_tech"`` (target = host de la
         URL).

@@ -122,7 +162,7 @@ def fingerprint_web_stack(
        url: URL objetivo. Sin esquema se asume https:// (fallback a http://),
            tal como hace fetch_http_fingerprint.
        timeout_s: timeout de la peticion HTTP en segundos. Default 15.0. Se pasa
-            tal cual a fetch_http_fingerprint.
+            tal cual a fetch_http_fingerprint (y al fetch CDP cuando use_cdp).
        verify_tls: si False, no verifica el certificado TLS (inseguro, solo para
            hosts propios con cert self-signed). Default True. Se pasa a
            fetch_http_fingerprint.
@@ -133,6 +173,17 @@ def fingerprint_web_stack(
            fetch + matching y no toca el vault ni el service osint_db. Politica
            recon: todo scan se archiva. Si el sink falla, el resultado degrada
            sin romper (saved.status="error").
+        use_cdp: si True, ademas del fetch estatico hace un fetch via Chrome
+            DevTools Protocol para analizar el HTML RENDERIZADO tras el JS y
+            detectar SPAs. Requiere un Chrome con remote debugging en cdp_port.
+            Si el CDP no esta disponible, DEGRADA al HTML estatico con un warning
+            (no falla). Default False (comportamiento estatico clasico, intacto).
+        cdp_port: puerto de remote debugging del Chrome a usar cuando use_cdp.
+            Default 9222 (navegador diario, global). Para recon de terceros sin
+            mezclar tu sesion personal, usar 9333 (Chrome aislado del browser_mcp).
+        wait_render_s: segundos de espera tras el load event para que la SPA
+            pinte el DOM (solo aplica con use_cdp). Default 2.0. Subir (4.0-6.0)
+            para SPAs lentas con mucho data-fetching.

    Returns:
        dict de estado. Nunca lanza.
@@ -151,23 +202,32 @@ def fingerprint_web_stack(
                ],
                "by_category": {<categoria>: [<nombre>, ...], ...},
                "count": int,
+                "html_source": "static" | "cdp",  # fuente del HTML analizado
+                "rendered": bool,             # True si html_source == "cdp"
+                "warnings": [<str>, ...],     # vacia si no hubo degradacion
                "saved": <dict de save_scan_to_osint> | None,
                "raw": "# fingerprint_web_stack ...\nTECHNOLOGY ...",
            }

-        error (el fetch HTTP fallo: host no resuelve, conexion rechazada,
-        timeout)::
+        error (el fetch HTTP estatico fallo Y use_cdp es False, o ambos fallaron:
+        host no resuelve, conexion rechazada, timeout)::

            {"status": "error", "stage": "fetch", "url": <url>, "fetch": <dict>}
    """
-    # 1. Fetch de senales. Si el fetch falla del todo, propagamos sin continuar.
+    warnings: list[str] = []
+
+    # 1. Fetch estatico SIEMPRE: aporta headers/server/status_code reales (CDP no
+    #    los da). Guardamos el resultado aunque falle: con use_cdp podemos seguir.
    fp = fetch_http_fingerprint(
        url,
        timeout_s=timeout_s,
        verify_tls=verify_tls,
        max_html_bytes=max_html_bytes,
    )
-    if fp.get("status") != "ok":
+    static_ok = fp.get("status") == "ok"
+
+    # Si el estatico falla del todo y NO vamos a intentar CDP, propagamos error.
+    if not static_ok and not use_cdp:
        return {
            "status": "error",
            "stage": "fetch",
@@ -175,25 +235,68 @@ def fingerprint_web_stack(
            "fetch": fp,
        }

-    final_url = fp.get("final_url", "") or ""
-    status_code = fp.get("status_code", 0)
-    server = fp.get("server") or ""
-    title = fp.get("title") or ""
+    # Senales de respuesta: del estatico cuando hay (CDP no las expone).
+    headers = fp.get("headers") or {} if static_ok else {}
+    static_cookies = fp.get("cookies") or [] if static_ok else []
+    static_html = fp.get("html") or "" if static_ok else ""
+    final_url = (fp.get("final_url") or "") if static_ok else ""
+    status_code = fp.get("status_code", 0) if static_ok else 0
+    server = (fp.get("server") or "") if static_ok else ""
+    title = (fp.get("title") or "") if static_ok else ""

-    # 2. Matching de firmas (puro): no toca red, solo aplica regex deterministas.
+    # 2. Elegir el HTML a analizar y la fuente.
+    html_to_analyze = static_html
+    html_source = "static"
+    cookies = list(static_cookies)
+
+    if use_cdp:
+        cdp = fetch_http_fingerprint_cdp(
+            url,
+            port=cdp_port,
+            wait_render_s=wait_render_s,
+            timeout_s=timeout_s,
+        )
+        if cdp.get("status") == "ok":
+            # HTML renderizado post-JS: la clave para detectar SPAs.
+            html_to_analyze = cdp.get("html") or ""
+            html_source = "cdp"
+            cookies = _union_cookie_names(static_cookies, cdp.get("cookies") or [])
+            # El CDP ve la URL final tras redirects client-side y el titulo
+            # renderizado; preferimos los suyos cuando el estatico no aporta.
+            final_url = final_url or (cdp.get("final_url") or "")
+            if not title:
+                title = cdp.get("title") or ""
+        else:
+            # DEGRADA: sin Chrome (o fallo CDP) seguimos con el HTML estatico.
+            cdp_err = cdp.get("error") or "desconocido"
+            warnings.append(f"cdp no disponible: {cdp_err}; usando fetch estatico")
+            if not static_ok:
+                # Ni estatico ni CDP: ahora si es error (no hay HTML que analizar).
+                return {
+                    "status": "error",
+                    "stage": "fetch",
+                    "url": url,
+                    "fetch": fp,
+                    "cdp": cdp,
+                    "warnings": warnings,
+                }
+
+    # 3. Matching de firmas (puro): no toca red, solo aplica regex deterministas.
    detection = detect_web_tech(
-        fp.get("headers") or {},
-        html=fp.get("html") or "",
-        cookies=fp.get("cookies") or [],
+        headers,
+        html=html_to_analyze,
+        cookies=cookies,
        final_url=final_url,
    )
    technologies = detection.get("technologies", [])
    by_category = detection.get("by_category", {})
    count = detection.get("count", len(technologies))

-    raw = _build_raw(url, final_url, status_code, server, title, technologies)
+    raw = _build_raw(
+        url, final_url, status_code, server, title, technologies, html_source
+    )

-    # 3. Archiva la evidencia en OSINT si procede (degrada sin romper).
+    # 4. Archiva la evidencia en OSINT si procede (degrada sin romper).
    saved = None
    if save:
        target = _target_from_url(url, final_url)
@@ -202,6 +305,7 @@ def fingerprint_web_stack(
            "by_category": by_category,
            "server": server,
            "status_code": status_code,
+            "html_source": html_source,
        }
        saved = save_scan_to_osint(
            target,
@@ -221,29 +325,52 @@ def fingerprint_web_stack(
        "technologies": technologies,
        "by_category": by_category,
        "count": count,
+        "html_source": html_source,
+        "rendered": html_source == "cdp",
+        "warnings": warnings,
        "saved": saved,
        "raw": raw,
    }


 def _parse_cli(argv: list[str]) -> dict:
-    """Parsea los args de CLI: <url> [--no-save] [--no-verify-tls].
+    """Parsea los args de CLI: <url> [--no-save] [--no-verify-tls] [--cdp] [--cdp-port N].

    Devuelve un dict de kwargs para fingerprint_web_stack.
    """
    positional: list[str] = []
    save = True
    verify_tls = True
+    use_cdp = False
+    cdp_port = 9222

-    for arg in argv:
+    i = 0
+    while i < len(argv):
+        arg = argv[i]
        if arg == "--no-save":
            save = False
        elif arg == "--no-verify-tls":
            verify_tls = False
+        elif arg == "--cdp":
+            use_cdp = True
+        elif arg == "--cdp-port":
+            i += 1
+            if i < len(argv):
+                try:
+                    cdp_port = int(argv[i])
+                except ValueError:
+                    pass
        else:
            positional.append(arg)
+        i += 1

-    return {"positional": positional, "save": save, "verify_tls": verify_tls}
+    return {
+        "positional": positional,
+        "save": save,
+        "verify_tls": verify_tls,
+        "use_cdp": use_cdp,
+        "cdp_port": cdp_port,
+    }


 if __name__ == "__main__":
@@ -258,11 +385,16 @@ if __name__ == "__main__":
            target_url,
            verify_tls=parsed["verify_tls"],
            save=parsed["save"],
+            use_cdp=parsed["use_cdp"],
+            cdp_port=parsed["cdp_port"],
        )
        print("status:", result.get("status"))
        if result.get("status") == "ok":
            print(f"url: {result['url']} -> {result['final_url']} ({result['status_code']})")
            print("server:", result["server"] or "-")
+            print("html_source:", result.get("html_source"))
+            for w in result.get("warnings", []):
+                print("warning:", w)
            print("--- technologies ---")
            print(result["raw"])
            saved = result.get("saved") or {}
@@ -178,3 +178,44 @@ def test_fetch_fallido_propaga_error_sin_red():
    assert result["fetch"]["status"] == "error", result
    # No se intento archivar nada.
    assert save_called["n"] == 0, save_called
+
+
+# --- 4. use_cdp sin Chrome: DEGRADA a estatico con warning (no falla) ---------
+
+def test_use_cdp_sin_chrome_degrada_a_estatico():
+    """use_cdp=True sin Chrome (cdp_port=1) degrada al fetch estatico con warning.
+
+    Levanta el mismo HTTPServer WordPress/nginx/PHP local que el golden y pide
+    use_cdp con cdp_port=1 (donde no hay ningun Chrome escuchando). El fetch CDP
+    falla, el pipeline NO rompe: usa el HTML estatico, marca html_source=static,
+    rellena warnings y sigue detectando WordPress/nginx por el html/headers.
+    """
+    httpd, port, thread = _start_wp_server()
+    try:
+        result = fingerprint_web_stack(
+            f"http://127.0.0.1:{port}/",
+            timeout_s=5.0,
+            save=False,
+            use_cdp=True,
+            cdp_port=1,  # puerto sin Chrome: el fetch CDP falla -> degrada
+            wait_render_s=0.0,
+        )
+    finally:
+        httpd.shutdown()
+        httpd.server_close()
+        thread.join(timeout=2.0)
+
+    # Degrado, no fallo.
+    assert result["status"] == "ok", result
+    # Cayo al HTML estatico (CDP no disponible).
+    assert result["html_source"] == "static", result
+    assert result["rendered"] is False, result
+    # Hubo warning de degradacion.
+    assert result["warnings"], result
+    assert any("cdp no disponible" in w for w in result["warnings"]), result["warnings"]
+    # La deteccion estatica sigue funcionando.
+    names = {t["name"] for t in result["technologies"]}
+    assert "WordPress" in names, names
+    assert "nginx" in names, names
+    # No se archivo (save=False).
+    assert result["saved"] is None, result