chore: auto-commit (286 archivos)

- .claude/agents/fn-orquestador/SKILL.md
- .claude/commands/fn_claude.md
- .claude/rules/INDEX.md
- .claude/rules/cpp_apps.md
- .claude/rules/ids_naming.md
- CHANGELOG.md
- apps/dag_engine/README.md
- apps/dag_engine/api.go
- apps/dag_engine/dags_migrated/example.yaml
- apps/dag_engine/dags_migrated/example_lineage_tracking.yaml
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-16 16:33:22 +02:00
parent d6175964e4
commit 212875ed0d
290 changed files with 12703 additions and 19778 deletions
@@ -0,0 +1,61 @@
---
name: llm_propose_scraping_schema
kind: function
lang: py
domain: infra
version: "1.0.0"
purity: impure
signature: "def llm_propose_scraping_schema(url: str, ax_tree: list, max_chunks: int = 5, max_chars_per_chunk: int = 25000) -> dict"
description: "Orquesta trim_ax_tree -> chunk_ax_tree -> N llamadas a Claude CLI -> merge. Propone schema de scraping (fields, selectors, types) a partir del AX tree de una pagina."
tags: [navegator, ai, llm, scraping, schema]
uses_functions: [trim_ax_tree_py_core, chunk_ax_tree_py_core, claude_cli_prompt_py_infra]
uses_types: []
returns: []
returns_optional: false
error_type: "error_go_core"
imports: [json, re, sys, os]
params:
- name: url
desc: "URL de la pagina (se incluye en el prompt a Claude para contexto)."
- name: ax_tree
desc: "AX tree como lista de dicts obtenida via CDP (cdp_get_ax_tree)."
- name: max_chunks
desc: "Maximo de chunks a procesar. Default 5. Si hay mas, truncated=True."
- name: max_chars_per_chunk
desc: "Caracteres maximos por chunk de AX tree enviado a Claude. Default 25000."
output: "dict {schema: [{field, selector, sample_value, type, source_role}], notes: str, chunks_processed: int, truncated: bool}"
tested: false
tests: []
test_file_path: ""
file_path: "python/functions/infra/llm_propose_scraping_schema.py"
---
## Ejemplo
```python
import sys
sys.path.insert(0, "python/functions")
from infra.llm_propose_scraping_schema import llm_propose_scraping_schema
# ax_tree obtenido previamente con cdp_get_ax_tree
result = llm_propose_scraping_schema(
url="https://shop.example.com/products",
ax_tree=ax_tree,
max_chunks=3,
)
# {"schema": [{"field": "price", "selector": ".product-price", ...}], "notes": "...", ...}
for field in result["schema"]:
print(field["field"], "->", field["selector"])
```
## Cuando usarla
Cuando tienes el AX tree de una pagina y quieres que Claude proponga automaticamente que campos extraer y con que selectores CSS. Paso de discovery antes de escribir la recipe YAML a mano o de forma asistida.
## Gotchas
- Requiere `claude` CLI instalado y disponible en PATH (validado por `claude_cli_prompt`).
- Cada chunk genera una llamada a Claude (coste de tokens). Usar `max_chunks` conservador en paginas muy grandes.
- La respuesta de Claude se parsea tolerando fenced code blocks (```json ... ```). Si Claude devuelve prosa sin JSON, el chunk se omite con nota de error.
- Dedup por `field`: primera ocurrencia gana si el mismo campo aparece en varios chunks.
- No accede a red directamente — delega en `claude_cli_prompt`.