chore: auto-commit (95 archivos)
- cmd/fn/doctor.go - cmd/fn/main.go - cpp/apps/primitives_gallery/playground/tables/CMakeLists.txt - cpp/apps/primitives_gallery/playground/tables/data_table.cpp - cpp/apps/primitives_gallery/playground/tables/data_table_logic.cpp - cpp/apps/primitives_gallery/playground/tables/data_table_logic.h - cpp/apps/primitives_gallery/playground/tables/self_test.cpp - cpp/apps/primitives_gallery/playground/tables/tql.cpp - cpp/apps/primitives_gallery/playground/tables/viz.cpp - cpp/apps/primitives_gallery/playground/tables/viz.h - ... Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,57 @@
|
||||
---
|
||||
name: vault_dedupe_report
|
||||
kind: function
|
||||
lang: py
|
||||
domain: infra
|
||||
version: "1.0.0"
|
||||
purity: impure
|
||||
signature: "def vault_dedupe_report(vault_path: str, min_size: int = 0, db_path: str | None = None) -> dict"
|
||||
description: "Detecta archivos duplicados en un vault leyendo vault_index.db (agrupando por sha256) y calcula el espacio recuperable. Retorna grupos ordenados por bytes recuperables DESC."
|
||||
tags: [vault, dedupe, duplicates, disk, sha256, sqlite]
|
||||
params:
|
||||
- name: vault_path
|
||||
desc: "Ruta raiz del vault. Usada como clave en el resultado y para localizar vault_index.db cuando db_path es None."
|
||||
- name: min_size
|
||||
desc: "Tamanio minimo en bytes para incluir un archivo en el analisis. Default 0 = todos los archivos."
|
||||
- name: db_path
|
||||
desc: "Override opcional de la ruta a vault_index.db. Si es None se usa <vault_path>/vault_index.db."
|
||||
output: "dict con vault_path, groups (sha256/size/count/files/reclaimable_bytes), total_groups, total_duplicates, total_reclaimable_bytes, scanned_files. groups ordenados por reclaimable_bytes DESC."
|
||||
uses_functions: []
|
||||
uses_types: []
|
||||
returns: []
|
||||
returns_optional: false
|
||||
error_type: "error_py_core"
|
||||
imports: ["sqlite3", "pathlib"]
|
||||
tested: true
|
||||
tests:
|
||||
- "test_no_duplicates"
|
||||
- "test_basic_duplicates"
|
||||
- "test_three_in_group"
|
||||
- "test_min_size_filter"
|
||||
- "test_multiple_groups_ordered"
|
||||
test_file_path: "python/functions/infra/tests/test_vault_dedupe_report.py"
|
||||
file_path: "python/functions/infra/vault_dedupe_report.py"
|
||||
---
|
||||
|
||||
## Ejemplo
|
||||
|
||||
```python
|
||||
from infra.vault_dedupe_report import vault_dedupe_report
|
||||
|
||||
report = vault_dedupe_report("/data/vaults/my_vault", min_size=1024)
|
||||
print(f"Grupos duplicados: {report['total_groups']}")
|
||||
print(f"Espacio recuperable: {report['total_reclaimable_bytes'] // (1024**2)} MB")
|
||||
|
||||
for g in report["groups"][:5]:
|
||||
print(f" sha256={g['sha256'][:12]}... size={g['size']} count={g['count']}")
|
||||
for f in g["files"]:
|
||||
print(f" {f}")
|
||||
```
|
||||
|
||||
## Notas
|
||||
|
||||
- Solo considera filas con `sha256 != ''` (archivos efectivamente hasheados por `vault_inventory_scan_go_infra`).
|
||||
- Abre la BD en modo read-only (`?mode=ro`) para no interferir con escrituras concurrentes.
|
||||
- `GROUP_CONCAT` de SQLite no garantiza orden — los `files` se reordenan lexicograficamente en Python.
|
||||
- Si la BD no existe o le falta la tabla `files`, lanza `RuntimeError` con mensaje orientativo.
|
||||
- Prerequisito: haber corrido `fn vault index <name>` (pipeline `vault_inventory_scan_go_infra` + `vault_index_write_go_infra`) sobre el vault.
|
||||
Reference in New Issue
Block a user