5f3bc84696
Añadido binario CLI en Go para gestionar datasets, jobs y runs en Marquez. Características: - Enviar eventos OpenLineage (START, RUNNING, COMPLETE, FAIL) - Registrar y consultar datasets - Registrar y consultar jobs y runs - Consultar lineage de datasets con formato texto/JSON - Listar recursos (namespaces, jobs, datasets) - Sin dependencias externas (solo Go stdlib) - Binario estático compilado de ~5MB Archivos: - tools/marquez-cli/main.go: CLI principal con comandos - tools/marquez-cli/openlineage.go: Cliente HTTP y estructuras OpenLineage - tools/marquez-cli/go.mod: Módulo de Go - tools/marquez-cli/Makefile: Build automation - tools/marquez-cli/README.md: Documentación completa - tools/marquez-cli/QUICKSTART.md: Guía rápida de uso Instalación: make install en ~/.local/bin/marquez-cli
214 lines
4.1 KiB
Markdown
214 lines
4.1 KiB
Markdown
# marquez-cli - Quick Start Guide
|
|
|
|
Guía rápida para empezar a usar `marquez-cli` en tus pipelines.
|
|
|
|
---
|
|
|
|
## ⚡ Instalación Rápida
|
|
|
|
```bash
|
|
# Compilar e instalar
|
|
cd ~/AutomaticProyects/automatic_process/tools/marquez-cli
|
|
make install
|
|
|
|
# Verificar
|
|
marquez-cli version
|
|
```
|
|
|
|
---
|
|
|
|
## 🎯 Uso Básico
|
|
|
|
### 1. Flujo Completo en un Script
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
JOB_NAME="my_pipeline"
|
|
RUN_ID=$(uuidgen)
|
|
|
|
# START
|
|
marquez-cli run start -job $JOB_NAME -run-id $RUN_ID
|
|
|
|
# Hacer trabajo...
|
|
curl https://api.example.com/data > /tmp/data.json
|
|
|
|
# COMPLETE
|
|
marquez-cli run complete \
|
|
-job $JOB_NAME \
|
|
-run-id $RUN_ID \
|
|
-inputs "api://example.com/data" \
|
|
-outputs "file:///tmp/data.json"
|
|
```
|
|
|
|
### 2. Con Manejo de Errores
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
set -euo pipefail
|
|
|
|
JOB_NAME="my_pipeline"
|
|
RUN_ID=$(uuidgen)
|
|
|
|
cleanup() {
|
|
marquez-cli run fail -job $JOB_NAME -run-id $RUN_ID
|
|
}
|
|
trap cleanup ERR
|
|
|
|
marquez-cli run start -job $JOB_NAME -run-id $RUN_ID
|
|
|
|
# Tu trabajo aquí...
|
|
|
|
marquez-cli run complete -job $JOB_NAME -run-id $RUN_ID
|
|
```
|
|
|
|
### 3. Pipeline Multi-Paso
|
|
|
|
```bash
|
|
JOB_NAME="etl_pipeline"
|
|
RUN_ID=$(uuidgen)
|
|
|
|
# START
|
|
marquez-cli run start -job $JOB_NAME -run-id $RUN_ID
|
|
|
|
# EXTRACT
|
|
curl https://api.example.com/data > /tmp/raw.json
|
|
marquez-cli run running \
|
|
-job $JOB_NAME \
|
|
-run-id $RUN_ID \
|
|
-inputs "api://example.com/data" \
|
|
-outputs "file:///tmp/raw.json"
|
|
|
|
# TRANSFORM
|
|
jq '.data' /tmp/raw.json > /tmp/clean.json
|
|
marquez-cli run running \
|
|
-job $JOB_NAME \
|
|
-run-id $RUN_ID \
|
|
-inputs "file:///tmp/raw.json" \
|
|
-outputs "file:///tmp/clean.json"
|
|
|
|
# LOAD
|
|
psql ... -c "COPY table FROM '/tmp/clean.json'"
|
|
marquez-cli run complete \
|
|
-job $JOB_NAME \
|
|
-run-id $RUN_ID \
|
|
-inputs "file:///tmp/clean.json" \
|
|
-outputs "postgres://localhost:5434/postgres/public/table"
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Consultar Lineage
|
|
|
|
```bash
|
|
# Ver datasets
|
|
marquez-cli list datasets
|
|
|
|
# Ver jobs
|
|
marquez-cli list jobs
|
|
|
|
# Ver lineage de un dataset
|
|
marquez-cli lineage -name "postgres://localhost:5434/postgres/public/events"
|
|
|
|
# Ver runs de un job
|
|
marquez-cli job runs -name my_pipeline
|
|
```
|
|
|
|
---
|
|
|
|
## 🔧 Integración con Dagu
|
|
|
|
Ver DAG de ejemplo: `~/dagu/dags/example_lineage_tracking.yaml`
|
|
|
|
Patrón básico:
|
|
|
|
```yaml
|
|
env:
|
|
- RUN_ID: ""
|
|
|
|
steps:
|
|
- name: init
|
|
command: echo "RUN_ID=$(uuidgen)" >> $DAGU_ENV
|
|
|
|
- name: start
|
|
command: marquez-cli run start -job $JOB_NAME -run-id $RUN_ID
|
|
depends: [init]
|
|
|
|
- name: work
|
|
command: # tu trabajo aquí
|
|
depends: [start]
|
|
|
|
- name: complete
|
|
command: marquez-cli run complete -job $JOB_NAME -run-id $RUN_ID
|
|
depends: [work]
|
|
|
|
handlers:
|
|
failure:
|
|
- command: marquez-cli run fail -job $JOB_NAME -run-id $RUN_ID
|
|
```
|
|
|
|
---
|
|
|
|
## 🧪 Probar con Ejemplo
|
|
|
|
```bash
|
|
# Ejecutar script de ejemplo
|
|
~/dagu/scripts/examples/simple_pipeline_with_lineage.sh
|
|
|
|
# Ver lineage generado
|
|
marquez-cli lineage -name "file:///tmp/users_clean.json"
|
|
|
|
# O abrir en navegador
|
|
xdg-open http://localhost:3001
|
|
```
|
|
|
|
---
|
|
|
|
## 📋 Comandos Más Usados
|
|
|
|
| Comando | Descripción |
|
|
|---------|-------------|
|
|
| `run start` | Iniciar un run |
|
|
| `run complete` | Completar exitosamente |
|
|
| `run fail` | Marcar como fallido |
|
|
| `run running` | Marcar progreso (intermedio) |
|
|
| `lineage` | Ver lineage de dataset |
|
|
| `list jobs` | Listar todos los jobs |
|
|
| `job runs` | Ver runs de un job |
|
|
|
|
---
|
|
|
|
## 🔍 URIs de Datasets
|
|
|
|
| Tipo | Formato |
|
|
|------|---------|
|
|
| PostgreSQL | `postgres://host:port/db/schema/table` |
|
|
| ClickHouse | `clickhouse://host:port/database/table` |
|
|
| NATS | `nats://host:port/subject` |
|
|
| Archivo | `file:///absolute/path` |
|
|
| API | `api://domain/endpoint` |
|
|
|
|
---
|
|
|
|
## ✅ Checklist
|
|
|
|
Cada pipeline debe:
|
|
|
|
- [ ] Enviar evento START al inicio
|
|
- [ ] Enviar eventos RUNNING en transformaciones intermedias
|
|
- [ ] Enviar evento COMPLETE al finalizar exitosamente
|
|
- [ ] Enviar evento FAIL si hay errores (handler)
|
|
- [ ] Usar el mismo run-id en todos los eventos
|
|
- [ ] Declarar todos los inputs/outputs
|
|
|
|
---
|
|
|
|
## 📚 Más Información
|
|
|
|
- [README completo](./README.md)
|
|
- [Documentación de OpenLineage](https://openlineage.io/)
|
|
- [Marquez Web UI](http://localhost:3001)
|
|
|
|
---
|
|
|
|
**Tip**: Usa `marquez-cli help` para ver todos los comandos disponibles.
|