6ad82167bb
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
95 lines
4.2 KiB
Markdown
95 lines
4.2 KiB
Markdown
---
|
|
name: osint-person-lookup
|
|
id: 0005
|
|
status: pending
|
|
created: 2026-05-16
|
|
updated: 2026-05-16
|
|
priority: medium
|
|
risk: medium
|
|
related_issues: [0098]
|
|
apps:
|
|
- navegator_dashboard
|
|
- odr_console
|
|
- graph_explorer
|
|
- agents_and_robots
|
|
trigger: manual
|
|
schedule: ""
|
|
expected_runtime_s: 300
|
|
tags: [osint, multi-tab, parallel, graph]
|
|
---
|
|
|
|
## Goal
|
|
|
|
Probar paralelismo (multiples scraping jobs concurrentes) + agregacion a grafo. Demuestra que graph_explorer cierra el circulo visualizando datos extraidos.
|
|
|
|
## Pre-requisitos
|
|
|
|
- odr_console activa (jobs queue).
|
|
- Chrome con sesiones activas en LinkedIn / Twitter / GitHub (no automatizar login).
|
|
- `projects/osint_graph/operations.db` accesible (graph_explorer la lee).
|
|
- claude CLI para resumen final.
|
|
|
|
## Flow
|
|
|
|
1. Input: nombre + apellido (`Juan Perez`).
|
|
2. odr_console crea 3 jobs concurrentes:
|
|
- Job A: navegator recipe `osint_linkedin_search.yaml` con query `{name}`.
|
|
- Job B: navegator recipe `osint_twitter_search.yaml` con query `{name}`.
|
|
- Job C: navegator recipe `osint_github_search.yaml` con query `{name}`.
|
|
3. Cada job extrae snippets: `{source, url, title, snippet, timestamp}`.
|
|
4. Cada snippet -> insert en `projects/osint_graph/operations.db` como entity `Snippet` + relations `mentions(Person, Snippet)`.
|
|
5. graph_explorer abre el operations.db -> renderiza red de menciones.
|
|
6. `claude -p` resume hallazgos en Markdown: dada lista de snippets, devuelve `{summary, confidence, suggested_next_steps}`.
|
|
7. Sink: report `.md` en `projects/osint_graph/reports/<person>-<date>.md`.
|
|
8. Matrix bot envia link al report.
|
|
|
|
## Acceptance
|
|
|
|
- [ ] 3 recipes osint creadas (LinkedIn, Twitter, GitHub).
|
|
- [ ] odr_console lanza 3 jobs paralelos sin race conditions.
|
|
- [ ] >= 5 snippets totales en operations.db.
|
|
- [ ] graph_explorer renderiza grafo con >=1 Person + N Snippets.
|
|
- [ ] Claude resumen generado y valido (no error).
|
|
- [ ] Report .md commiteado en repo osint_graph.
|
|
|
|
## Telemetria esperada
|
|
|
|
- 3 runs en `data_factory.runs` (uno por source).
|
|
- `operations.db` de osint_graph: entities += N, relations += N.
|
|
- `function_stats.claude_cli_prompt_py_infra`: calls += 1.
|
|
|
|
## Definition of Done
|
|
|
|
Ver `README.md` seccion DoD + user-facing. **Risk=medium** -> attention en datos personales.
|
|
|
|
### Generico
|
|
|
|
- [ ] **Repetibilidad**: 3 lookups distintos (3 personas test) producen reports completos sin re-config.
|
|
- [ ] **Observabilidad**: 3 jobs visibles en `odr_console.operations.db` + `call_monitor.calls` chain por job.
|
|
- [ ] **Error-path**: si LinkedIn devuelve 429 → job marcado failed, otros 2 continuan (no aborta el flow entero).
|
|
- [ ] **Idempotencia**: re-lookup misma persona → upsert por `snippet_hash`, no duplica nodos Person.
|
|
- [ ] **Secrets**: creds Twitter/GitHub en `pass`; LinkedIn usa sesion del navegador (cookie via navegator).
|
|
- [ ] **Docs**: `## Notas` con onboarding + check legal.
|
|
- [ ] **Registry-first**: recipes + agregacion + render reusan funciones registry.
|
|
- [ ] **INDEX + status**: `status: done` + INDEX + movido.
|
|
|
|
### User-facing
|
|
|
|
- [ ] **User-facing**: usuario abre `graph_explorer.exe` → File → Load dataset `osint/<persona>` → ve grafo Person + N Snippets navegable (zoom, click → snippet content).
|
|
- [ ] **User-facing repeat**: persona nueva → comando lanza job, dataset aparece en lista de graph_explorer en <5min.
|
|
- [ ] **User-facing onboarding**: parrafo en `## Notas`: "Para investigar persona: `/flow run 0005 --target '<nombre>'` (o `odr_console.exe` → New Job → 3 recipes). Esperar ~5min. Abrir `graph_explorer.exe` → Load `osint/<nombre>`. Resumen LLM en `report.md` del repo."
|
|
- [ ] **User-facing latencia**: job lanzado → grafo listo en <5min (3 jobs paralelos).
|
|
|
|
### Custom
|
|
|
|
- [ ] Paralelismo medido: 3 jobs concurrentes <60s wall vs ~180s en serie.
|
|
- [ ] Race-condition test: 2 corridas simultaneas del flow no corrompen operations.db.
|
|
- [ ] Red-team: nada de menores/info no publica en snippets capturados.
|
|
- [ ] Report `.md` firmado por commit en repo `osint_graph`.
|
|
|
|
## Notas
|
|
|
|
- Consideracion legal: extracciones publicas (perfiles abiertos). NO bypassear paywalls/captchas.
|
|
- LinkedIn detecta scraping agresivo -> usar rate-limit por job en navegator.
|
|
- Caso ambicioso (5 apps + paralelismo + LLM). Reservar como hito.
|