# marquez-cli - Quick Start Guide Guía rápida para empezar a usar `marquez-cli` en tus pipelines. --- ## ⚡ Instalación Rápida ```bash # Compilar e instalar cd ~/AutomaticProyects/automatic_process/tools/marquez-cli make install # Verificar marquez-cli version ``` --- ## 🎯 Uso Básico ### 1. Flujo Completo en un Script ```bash #!/bin/bash JOB_NAME="my_pipeline" RUN_ID=$(uuidgen) # START marquez-cli run start -job $JOB_NAME -run-id $RUN_ID # Hacer trabajo... curl https://api.example.com/data > /tmp/data.json # COMPLETE marquez-cli run complete \ -job $JOB_NAME \ -run-id $RUN_ID \ -inputs "api://example.com/data" \ -outputs "file:///tmp/data.json" ``` ### 2. Con Manejo de Errores ```bash #!/bin/bash set -euo pipefail JOB_NAME="my_pipeline" RUN_ID=$(uuidgen) cleanup() { marquez-cli run fail -job $JOB_NAME -run-id $RUN_ID } trap cleanup ERR marquez-cli run start -job $JOB_NAME -run-id $RUN_ID # Tu trabajo aquí... marquez-cli run complete -job $JOB_NAME -run-id $RUN_ID ``` ### 3. Pipeline Multi-Paso ```bash JOB_NAME="etl_pipeline" RUN_ID=$(uuidgen) # START marquez-cli run start -job $JOB_NAME -run-id $RUN_ID # EXTRACT curl https://api.example.com/data > /tmp/raw.json marquez-cli run running \ -job $JOB_NAME \ -run-id $RUN_ID \ -inputs "api://example.com/data" \ -outputs "file:///tmp/raw.json" # TRANSFORM jq '.data' /tmp/raw.json > /tmp/clean.json marquez-cli run running \ -job $JOB_NAME \ -run-id $RUN_ID \ -inputs "file:///tmp/raw.json" \ -outputs "file:///tmp/clean.json" # LOAD psql ... -c "COPY table FROM '/tmp/clean.json'" marquez-cli run complete \ -job $JOB_NAME \ -run-id $RUN_ID \ -inputs "file:///tmp/clean.json" \ -outputs "postgres://localhost:5434/postgres/public/table" ``` --- ## 📊 Consultar Lineage ```bash # Ver datasets marquez-cli list datasets # Ver jobs marquez-cli list jobs # Ver lineage de un dataset marquez-cli lineage -name "postgres://localhost:5434/postgres/public/events" # Ver runs de un job marquez-cli job runs -name my_pipeline ``` --- ## 🔧 Integración con Dagu Ver DAG de ejemplo: `~/dagu/dags/example_lineage_tracking.yaml` Patrón básico: ```yaml env: - RUN_ID: "" steps: - name: init command: echo "RUN_ID=$(uuidgen)" >> $DAGU_ENV - name: start command: marquez-cli run start -job $JOB_NAME -run-id $RUN_ID depends: [init] - name: work command: # tu trabajo aquí depends: [start] - name: complete command: marquez-cli run complete -job $JOB_NAME -run-id $RUN_ID depends: [work] handlers: failure: - command: marquez-cli run fail -job $JOB_NAME -run-id $RUN_ID ``` --- ## 🧪 Probar con Ejemplo ```bash # Ejecutar script de ejemplo ~/dagu/scripts/examples/simple_pipeline_with_lineage.sh # Ver lineage generado marquez-cli lineage -name "file:///tmp/users_clean.json" # O abrir en navegador xdg-open http://localhost:3001 ``` --- ## 📋 Comandos Más Usados | Comando | Descripción | |---------|-------------| | `run start` | Iniciar un run | | `run complete` | Completar exitosamente | | `run fail` | Marcar como fallido | | `run running` | Marcar progreso (intermedio) | | `lineage` | Ver lineage de dataset | | `list jobs` | Listar todos los jobs | | `job runs` | Ver runs de un job | --- ## 🔍 URIs de Datasets | Tipo | Formato | |------|---------| | PostgreSQL | `postgres://host:port/db/schema/table` | | ClickHouse | `clickhouse://host:port/database/table` | | NATS | `nats://host:port/subject` | | Archivo | `file:///absolute/path` | | API | `api://domain/endpoint` | --- ## ✅ Checklist Cada pipeline debe: - [ ] Enviar evento START al inicio - [ ] Enviar eventos RUNNING en transformaciones intermedias - [ ] Enviar evento COMPLETE al finalizar exitosamente - [ ] Enviar evento FAIL si hay errores (handler) - [ ] Usar el mismo run-id en todos los eventos - [ ] Declarar todos los inputs/outputs --- ## 📚 Más Información - [README completo](./README.md) - [Documentación de OpenLineage](https://openlineage.io/) - [Marquez Web UI](http://localhost:3001) --- **Tip**: Usa `marquez-cli help` para ver todos los comandos disponibles.