Files
fn_registry/dev/flows/0001-hn-top-stories.md
T
egutierrez 212875ed0d chore: auto-commit (286 archivos)
- .claude/agents/fn-orquestador/SKILL.md
- .claude/commands/fn_claude.md
- .claude/rules/INDEX.md
- .claude/rules/cpp_apps.md
- .claude/rules/ids_naming.md
- CHANGELOG.md
- apps/dag_engine/README.md
- apps/dag_engine/api.go
- apps/dag_engine/dags_migrated/example.yaml
- apps/dag_engine/dags_migrated/example_lineage_tracking.yaml
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 16:33:22 +02:00

2.7 KiB

name, id, status, created, updated, priority, risk, related_issues, apps, trigger, schedule, expected_runtime_s, tags
name id status created updated priority risk related_issues apps trigger schedule expected_runtime_s tags
hn-top-stories 0001 pending 2026-05-16 2026-05-16 high low
0097
0098
navegator_dashboard
dag_engine
data_factory
agents_and_robots
cron */30 * * * * 30
scraping
news
smoke-test
multi-app

Goal

Probar end-to-end el stack: navegator AutoExtract -> recipe -> dag_engine schedule -> data_factory.runs -> matrix bot. Pagina cero-auth + cero-coste. Si esto funciona, todo el plumbing es solido.

Pre-requisitos

  • Chrome lanzado con --remote-debugging-port=9222 (via navegator_dashboard "Open visible browser").
  • claude CLI en PATH (auto-extract requiere LLM).
  • sqlite_api activo en :8484.
  • dag_engine activo en :8090.
  • (opcional) Bot Matrix en sala #fn-registry-news para el sink final.

Flow

  1. Lanzar Chrome via navegator (puerto 9222).
  2. AutoExtract panel: URL https://news.ycombinator.com. Click "Open & Analyze".
  3. Esperar ~10-20s. Verificar schema propuesto: rank, title, url, points, comments, age.
  4. Refinar selectors si IA proponen rotos. Test extraction -> preview rows >= 20.
  5. Save as recipe hn_top.yaml (en projects/navegator/profiles/default/recipes/).
  6. Crear DAG ~/.dagu/dags/hn-top.yaml (manual o copy de apps/dag_engine/dags_migrated/):
    name: hn-top-stories
    description: Scrape HN top stories cada 30 min
    schedule: "*/30 * * * *"
    steps:
      - name: extract
        function: cdp_extract_recipe_py_pipelines
        args: ["projects/navegator/profiles/default/recipes/hn_top.yaml"]
    
  7. Reload dag_engine + activar scheduler. Trigger Run Now una vez para probar.
  8. dag_engine_ui: verificar run con status=success + function_id correcto en step.
  9. data_factory: tab Extractors muestra nodo hn_top_stories (creado por save recipe). Tab "All Runs" muestra runs nuevos.
  10. (opcional) Anadir step transformer filtra points > 100 -> sink matrix bot.

Acceptance

  • Recipe creada y validada (validate_recipe_yaml_py_core OK).
  • DAG corre OK 2 veces consecutivas via scheduler.
  • data_factory.runs tiene >=2 entries con node_id='hn_top_stories'.
  • cdp_extract_recipe_py_pipelines aparece en call_monitor.calls.
  • Schema extraido cubre 6/6 fields (rank, title, url, points, comments, age).
  • (opcional) Matrix bot recibe >=1 mensaje con top story filtrada.

Telemetria esperada

  • function_stats.cdp_extract_recipe_py_pipelines: calls_24h += 2.
  • data_factory.runs: 2 nuevas filas con trigger='cron'.
  • dag_engine.dag_step_results: step extract con function_id='cdp_extract_recipe_py_pipelines'.
  • call_monitor.calls: chain function call.

Notas

(rellenas tras correr)