feat(infra): conexion y consulta directa a SQL Server (Navision) via pymssql

Grupo de capacidad nuevo 'sql-connect' (3 funciones) para conectar a un
Microsoft SQL Server (donde corre Navision) y consultar directamente, en
lugar del ida y vuelta manual de pegar CSVs.

- mssql_connect_py_infra: abre conexion pymssql (login_timeout acotado,
  credenciales por argumento, RuntimeError claro si falla).
- mssql_query_py_infra: SELECT parametrizada con binding seguro (sin
  inyeccion) sobre conexion abierta; devuelve {columns, rows, row_count};
  0 filas -> lista vacia; max_rows con fetchmany; read-only.
- run_mssql_query_py_pipelines: one-shot que compone connect+query y cierra
  siempre; CLI imprime JSON o CSV; contrasena desde env var (pass).

Pagina madre docs/capabilities/sql-connect.md + fila en INDEX.md.
Dependencia pymssql>=2.3.13 anadida a python/pyproject.toml + uv.lock.
Tests mock-based (11) verdes; error path verificado end-to-end contra el
driver real (host inalcanzable -> RuntimeError, acotado por login_timeout).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-22 11:29:49 +02:00
parent c1f355ffa5
commit 86d68dc9f0
13 changed files with 930 additions and 0 deletions
+77
View File
@@ -0,0 +1,77 @@
"""Run a parameterized SELECT over an open pymssql (SQL Server / Navision) connection."""
from __future__ import annotations
def mssql_query(conn, sql: str, params=None, max_rows: int | None = None) -> dict:
"""Execute a SELECT on an already-open connection and map rows to dicts.
The connection is supplied by the caller (typically from `mssql_connect`),
so a single connection can be opened once and reused for many queries. This
function never opens or closes the connection — it only borrows it. It is
impure I/O: it touches the database over an existing connection.
Parameter binding is delegated to the driver: `params` is passed straight to
`cursor.execute(sql, params)`. NEVER interpolate values into `sql` with
f-strings or `%` formatting — that opens the door to SQL injection. Use the
pymssql placeholders `%s` (positional) or `%(name)s` (named) in `sql` and
let the driver bind safely. When `params is None`, the SQL is executed with
no bound parameters.
The query runs read-only: no commit is issued. The cursor opened here is
always closed before returning (try/finally), even on error.
Args:
conn: An open connection object (e.g. the one returned by
`mssql_connect`). Used by duck typing via `conn.cursor()`, so the
concrete driver does not matter and the function stays testable.
sql: The SELECT statement, using pymssql placeholders `%s` (positional)
or `%(name)s` (named) for any bound values.
params: A tuple/list for positional placeholders, a dict for named
placeholders, or None for a query with no parameters. Passed to
`cursor.execute(sql, params)` for safe driver-side binding.
max_rows: If a positive int, only the first `max_rows` rows are fetched
(via `cursor.fetchmany(max_rows)`). If None, all rows are fetched
(via `cursor.fetchall()`).
Returns:
A dict with three keys:
- "columns": list of column names in result order (empty list if the
statement produced no result set, i.e. `cursor.description is None`).
- "rows": list of dicts, one per row, mapping each column name to its
value. Empty list when the query returned no rows.
- "row_count": int, equal to `len(rows)`.
Raises:
RuntimeError: If executing or fetching the query fails. The message is
deliberately generic (it does not include the SQL or the params,
which may carry sensitive data) and the original exception is
chained for debugging.
"""
cur = conn.cursor()
try:
try:
if params is None:
cur.execute(sql)
else:
cur.execute(sql, params)
description = cur.description
if description is None:
columns: list = []
raw_rows: list = []
else:
columns = [d[0] for d in description]
if max_rows is not None and max_rows > 0:
raw_rows = cur.fetchmany(max_rows)
else:
raw_rows = cur.fetchall()
except Exception as exc:
raise RuntimeError(
f"mssql_query failed executing query: {exc}"
) from exc
finally:
cur.close()
rows = [dict(zip(columns, row)) for row in raw_rows]
return {"columns": columns, "rows": rows, "row_count": len(rows)}