Files
fn_registry/python/functions/infra/compress_pdf_ghostscript.py
T
egutierrez faac610745 feat: extraccion masiva footprint_aurgi (41 funcs + 4 types + stack Docker geo)
Extrae al registry funciones del proyecto interno footprint_aurgi:
- core (6): slugify_ascii, normalize_for_join, cp_provincia_es, infer_provincia_from_cp, safe_read_csv_fallback, csv_to_parquet_duckdb
- geo puras (7): haversine_km, point_in_ring, point_in_polygon, point_in_polygons_bbox, polygon_bbox, extent_with_padding, distance_bucket
- geo I/O (4): load_geojson_polygons, load_boundary_gdf, add_basemap_osm, add_basemap_with_timeout
- valhalla client (4): valhalla_route, valhalla_isochrone, valhalla_isochrones_async, valhalla_matrix_1_to_n
- datascience stats (7): trimmed_mean, geometric_mean, detect_distribution_type, best_central_tendency, summary_stats, kde_density_levels, alpha_shape_concave_hull
- datascience fuzzy (3): fuzzy_merge_adaptive (rapidfuzz), words_to_dataset, remove_words_from_column
- datascience viz (2): plot_kde_2d, plot_heatmap_log
- infra (4): compress_pdf_ghostscript, render_table_page_pdfpages, add_header_logo, osm2pgsql_ingest
- pipelines (4): setup_geo_stack_docker, compute_centers_reachability, generate_isochrones_by_zone, count_points_per_zone
- types geo (4): LonLat, BBox, IsochroneRequest, Centro

Incluye:
- apps/footprint_geo_stack/ (PostGIS + Martin + Valhalla via docker-compose)
- 131/132 tests pasan (1 skip esperado: osm2pgsql en PATH)
- Issue tracker dev/issues/0052-footprint-aurgi-extraction.md
- Atribucion uniforme: source_repo internal:footprint_aurgi, source_license internal-aurgi
- Build con 9 agentes en paralelo (8 wave 1 + 1 wave 2 pipelines)

Tambien commitea trabajo previo no commiteado: aggregate_extraction_results, chunk_with_overlap, clean_pdf_text, merge_entity_aliases, extract_graph_gliner2, extract_relations_mrebel, extract_triples_spacy_es, gliner2/mrebel/marianmt/rebel/spacy_es load_model, parse_rebel_output, translate_es_to_en, issue 0050/0051.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 23:35:22 +02:00

64 lines
2.0 KiB
Python

"""Compress a PDF file in-place using Ghostscript."""
from __future__ import annotations
import shutil
import subprocess
import tempfile
from pathlib import Path
def compress_pdf_ghostscript(
pdf_path: "str | Path",
quality: str = "screen",
) -> bool:
"""Compress a PDF in-place using Ghostscript.
Runs gs with downsampling (96 dpi color/gray, 200 dpi mono). Replaces the
original file only when the compressed output is strictly smaller. Returns
True if the file was replaced, False if gs is not available, the file does
not exist, compression failed, or the output was not smaller.
Args:
pdf_path: Path to the PDF file to compress (modified in-place on success).
quality: Ghostscript PDFSETTINGS profile. One of "screen", "ebook",
"printer", "prepress".
Returns:
True if the file was compressed and replaced, False otherwise.
"""
path = Path(pdf_path)
gs = shutil.which("gs")
if not gs or not path.exists():
return False
with tempfile.TemporaryDirectory() as tmpdir:
compressed = Path(tmpdir) / "compressed.pdf"
cmd = [
gs,
"-sDEVICE=pdfwrite",
"-dCompatibilityLevel=1.4",
f"-dPDFSETTINGS=/{quality}",
"-dDownsampleColorImages=true",
"-dDownsampleGrayImages=true",
"-dDownsampleMonoImages=true",
"-dColorImageResolution=96",
"-dGrayImageResolution=96",
"-dMonoImageResolution=200",
"-dNOPAUSE",
"-dQUIET",
"-dBATCH",
f"-sOutputFile={compressed}",
str(path),
]
try:
subprocess.run(cmd, check=True, capture_output=True)
except subprocess.CalledProcessError:
return False
if compressed.exists() and compressed.stat().st_size < path.stat().st_size:
import shutil as _sh
_sh.copy2(str(compressed), str(path))
return True
return False