Files
fn_registry/python/functions/infra/vault_dedupe_report.md
T
egutierrez a802f59f55 chore: auto-commit (95 archivos)
- cmd/fn/doctor.go
- cmd/fn/main.go
- cpp/apps/primitives_gallery/playground/tables/CMakeLists.txt
- cpp/apps/primitives_gallery/playground/tables/data_table.cpp
- cpp/apps/primitives_gallery/playground/tables/data_table_logic.cpp
- cpp/apps/primitives_gallery/playground/tables/data_table_logic.h
- cpp/apps/primitives_gallery/playground/tables/self_test.cpp
- cpp/apps/primitives_gallery/playground/tables/tql.cpp
- cpp/apps/primitives_gallery/playground/tables/viz.cpp
- cpp/apps/primitives_gallery/playground/tables/viz.h
- ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 00:50:34 +02:00

2.4 KiB

name, kind, lang, domain, version, purity, signature, description, tags, params, output, uses_functions, uses_types, returns, returns_optional, error_type, imports, tested, tests, test_file_path, file_path
name kind lang domain version purity signature description tags params output uses_functions uses_types returns returns_optional error_type imports tested tests test_file_path file_path
vault_dedupe_report function py infra 1.0.0 impure def vault_dedupe_report(vault_path: str, min_size: int = 0, db_path: str | None = None) -> dict Detecta archivos duplicados en un vault leyendo vault_index.db (agrupando por sha256) y calcula el espacio recuperable. Retorna grupos ordenados por bytes recuperables DESC.
vault
dedupe
duplicates
disk
sha256
sqlite
name desc
vault_path Ruta raiz del vault. Usada como clave en el resultado y para localizar vault_index.db cuando db_path es None.
name desc
min_size Tamanio minimo en bytes para incluir un archivo en el analisis. Default 0 = todos los archivos.
name desc
db_path Override opcional de la ruta a vault_index.db. Si es None se usa <vault_path>/vault_index.db.
dict con vault_path, groups (sha256/size/count/files/reclaimable_bytes), total_groups, total_duplicates, total_reclaimable_bytes, scanned_files. groups ordenados por reclaimable_bytes DESC.
false error_py_core
sqlite3
pathlib
true
test_no_duplicates
test_basic_duplicates
test_three_in_group
test_min_size_filter
test_multiple_groups_ordered
python/functions/infra/tests/test_vault_dedupe_report.py python/functions/infra/vault_dedupe_report.py

Ejemplo

from infra.vault_dedupe_report import vault_dedupe_report

report = vault_dedupe_report("/data/vaults/my_vault", min_size=1024)
print(f"Grupos duplicados: {report['total_groups']}")
print(f"Espacio recuperable: {report['total_reclaimable_bytes'] // (1024**2)} MB")

for g in report["groups"][:5]:
    print(f"  sha256={g['sha256'][:12]}... size={g['size']} count={g['count']}")
    for f in g["files"]:
        print(f"    {f}")

Notas

  • Solo considera filas con sha256 != '' (archivos efectivamente hasheados por vault_inventory_scan_go_infra).
  • Abre la BD en modo read-only (?mode=ro) para no interferir con escrituras concurrentes.
  • GROUP_CONCAT de SQLite no garantiza orden — los files se reordenan lexicograficamente en Python.
  • Si la BD no existe o le falta la tabla files, lanza RuntimeError con mensaje orientativo.
  • Prerequisito: haber corrido fn vault index <name> (pipeline vault_inventory_scan_go_infra + vault_index_write_go_infra) sobre el vault.