Files
fn_registry/dev/flows/0001-hn-top-stories.md
T
egutierrez 212875ed0d chore: auto-commit (286 archivos)
- .claude/agents/fn-orquestador/SKILL.md
- .claude/commands/fn_claude.md
- .claude/rules/INDEX.md
- .claude/rules/cpp_apps.md
- .claude/rules/ids_naming.md
- CHANGELOG.md
- apps/dag_engine/README.md
- apps/dag_engine/api.go
- apps/dag_engine/dags_migrated/example.yaml
- apps/dag_engine/dags_migrated/example_lineage_tracking.yaml
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 16:33:22 +02:00

74 lines
2.7 KiB
Markdown

---
name: hn-top-stories
id: 0001
status: pending
created: 2026-05-16
updated: 2026-05-16
priority: high
risk: low
related_issues: [0097, 0098]
apps:
- navegator_dashboard
- dag_engine
- data_factory
- agents_and_robots
trigger: cron
schedule: "*/30 * * * *"
expected_runtime_s: 30
tags: [scraping, news, smoke-test, multi-app]
---
## Goal
Probar end-to-end el stack: navegator AutoExtract -> recipe -> dag_engine schedule -> data_factory.runs -> matrix bot. Pagina cero-auth + cero-coste. Si esto funciona, todo el plumbing es solido.
## Pre-requisitos
- Chrome lanzado con `--remote-debugging-port=9222` (via navegator_dashboard "Open visible browser").
- `claude` CLI en PATH (auto-extract requiere LLM).
- sqlite_api activo en `:8484`.
- dag_engine activo en `:8090`.
- (opcional) Bot Matrix en sala `#fn-registry-news` para el sink final.
## Flow
1. Lanzar Chrome via navegator (puerto 9222).
2. AutoExtract panel: URL `https://news.ycombinator.com`. Click "Open & Analyze".
3. Esperar ~10-20s. Verificar schema propuesto: `rank`, `title`, `url`, `points`, `comments`, `age`.
4. Refinar selectors si IA proponen rotos. Test extraction -> preview rows >= 20.
5. Save as recipe `hn_top.yaml` (en `projects/navegator/profiles/default/recipes/`).
6. Crear DAG `~/.dagu/dags/hn-top.yaml` (manual o copy de `apps/dag_engine/dags_migrated/`):
```yaml
name: hn-top-stories
description: Scrape HN top stories cada 30 min
schedule: "*/30 * * * *"
steps:
- name: extract
function: cdp_extract_recipe_py_pipelines
args: ["projects/navegator/profiles/default/recipes/hn_top.yaml"]
```
7. Reload dag_engine + activar scheduler. Trigger Run Now una vez para probar.
8. dag_engine_ui: verificar run con status=success + function_id correcto en step.
9. data_factory: tab Extractors muestra nodo `hn_top_stories` (creado por save recipe). Tab "All Runs" muestra runs nuevos.
10. (opcional) Anadir step transformer filtra `points > 100` -> sink matrix bot.
## Acceptance
- [ ] Recipe creada y validada (`validate_recipe_yaml_py_core` OK).
- [ ] DAG corre OK 2 veces consecutivas via scheduler.
- [ ] `data_factory.runs` tiene >=2 entries con `node_id='hn_top_stories'`.
- [ ] `cdp_extract_recipe_py_pipelines` aparece en `call_monitor.calls`.
- [ ] Schema extraido cubre 6/6 fields (rank, title, url, points, comments, age).
- [ ] (opcional) Matrix bot recibe >=1 mensaje con top story filtrada.
## Telemetria esperada
- `function_stats.cdp_extract_recipe_py_pipelines`: calls_24h += 2.
- `data_factory.runs`: 2 nuevas filas con `trigger='cron'`.
- `dag_engine.dag_step_results`: step `extract` con `function_id='cdp_extract_recipe_py_pipelines'`.
- `call_monitor.calls`: chain function call.
## Notas
(rellenas tras correr)