fn_registry/dev/flows/0001-hn-top-stories.md at 212875ed0d734a413c266b88c95109cb333c68b9

Files

T

egutierrez 212875ed0d chore: auto-commit (286 archivos)

- .claude/agents/fn-orquestador/SKILL.md
- .claude/commands/fn_claude.md
- .claude/rules/INDEX.md
- .claude/rules/cpp_apps.md
- .claude/rules/ids_naming.md
- CHANGELOG.md
- apps/dag_engine/README.md
- apps/dag_engine/api.go
- apps/dag_engine/dags_migrated/example.yaml
- apps/dag_engine/dags_migrated/example_lineage_tracking.yaml
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-16 16:33:22 +02:00

2.7 KiB

Raw Blame History

name, id, status, created, updated, priority, risk, related_issues, apps, trigger, schedule, expected_runtime_s, tags

name

status

created

updated

priority

risk

related_issues

apps

trigger

schedule

expected_runtime_s

Goal

Probar end-to-end el stack: navegator AutoExtract -> recipe -> dag_engine schedule -> data_factory.runs -> matrix bot. Pagina cero-auth + cero-coste. Si esto funciona, todo el plumbing es solido.

Pre-requisitos

Chrome lanzado con --remote-debugging-port=9222 (via navegator_dashboard "Open visible browser").
claude CLI en PATH (auto-extract requiere LLM).
sqlite_api activo en :8484.
dag_engine activo en :8090.
(opcional) Bot Matrix en sala #fn-registry-news para el sink final.

Flow

Lanzar Chrome via navegator (puerto 9222).
AutoExtract panel: URL https://news.ycombinator.com. Click "Open & Analyze".
Esperar ~10-20s. Verificar schema propuesto: rank, title, url, points, comments, age.
Refinar selectors si IA proponen rotos. Test extraction -> preview rows >= 20.
Save as recipe hn_top.yaml (en projects/navegator/profiles/default/recipes/).

Crear DAG ~/.dagu/dags/hn-top.yaml (manual o copy de apps/dag_engine/dags_migrated/):

name: hn-top-stories
description: Scrape HN top stories cada 30 min
schedule: "*/30 * * * *"
steps:
  - name: extract
    function: cdp_extract_recipe_py_pipelines
    args: ["projects/navegator/profiles/default/recipes/hn_top.yaml"]

Reload dag_engine + activar scheduler. Trigger Run Now una vez para probar.
dag_engine_ui: verificar run con status=success + function_id correcto en step.
data_factory: tab Extractors muestra nodo hn_top_stories (creado por save recipe). Tab "All Runs" muestra runs nuevos.
(opcional) Anadir step transformer filtra points > 100 -> sink matrix bot.

Acceptance

Recipe creada y validada (validate_recipe_yaml_py_core OK).
DAG corre OK 2 veces consecutivas via scheduler.
data_factory.runs tiene >=2 entries con node_id='hn_top_stories'.
cdp_extract_recipe_py_pipelines aparece en call_monitor.calls.
Schema extraido cubre 6/6 fields (rank, title, url, points, comments, age).
(opcional) Matrix bot recibe >=1 mensaje con top story filtrada.

Telemetria esperada

function_stats.cdp_extract_recipe_py_pipelines: calls_24h += 2.
data_factory.runs: 2 nuevas filas con trigger='cron'.
dag_engine.dag_step_results: step extract con function_id='cdp_extract_recipe_py_pipelines'.
call_monitor.calls: chain function call.

Notas

(rellenas tras correr)

2.7 KiB Raw Blame History