{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "38a86e03-762e-49e0-8794-407c39e78fc3",
"metadata": {},
"outputs": [],
"source": [
"# (celda inicial generada por jupyter_write create — se ignora)"
]
},
{
"cell_type": "markdown",
"id": "02637343",
"metadata": {},
"source": [
"# Comparación precios de venta NAV vs TPV\n",
"\n",
"**Fuentes:**\n",
"- `autingo-159109.mssql2022_dbo.sales_price` — NAV (Navision)\n",
"- `autingo-159109.psql_dcpublic.sale_prices` — TPV (Datacentric/PostgreSQL)\n",
"\n",
"**Puente:** `autingo-159109.psql_dcpublic.products.nav_id` ↔ `sales_price.item_no_`\n",
"\n",
"**Objetivo:** determinar si las dos tablas representan los mismos precios."
]
},
{
"cell_type": "markdown",
"id": "f586f3bb",
"metadata": {},
"source": [
"## 0 · Setup BigQuery\n",
"\n",
"Con ADC de end-user no tenemos `serviceusage.services.use` en `autingo-159109`. Forzamos `quota_project_id=None` para que BQ cobre al proyecto de la query. Sin esto algunas queries fallan con `403 USER_PROJECT_DENIED`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "bbace39d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'autingo-159109'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import google.auth\n",
"from google.cloud import bigquery\n",
"import pandas as pd\n",
"\n",
"PROJECT = \"autingo-159109\"\n",
"NAV = f\"`{PROJECT}.mssql2022_dbo.sales_price`\"\n",
"TPV = f\"`{PROJECT}.psql_dcpublic.sale_prices`\"\n",
"PRODUCTS = f\"`{PROJECT}.psql_dcpublic.products`\"\n",
"CHANNELS = f\"`{PROJECT}.psql_dcpublic.channels`\"\n",
"\n",
"creds, _ = google.auth.default()\n",
"if hasattr(creds, \"with_quota_project\"):\n",
" creds = creds.with_quota_project(None)\n",
"bq = bigquery.Client(project=PROJECT, credentials=creds,\n",
" client_options={\"quota_project_id\": None})\n",
"\n",
"def q(sql: str) -> pd.DataFrame:\n",
" \"\"\"Ejecuta una query y devuelve dataframe. Inmutable.\"\"\"\n",
" return bq.query(sql).to_dataframe()\n",
"\n",
"bq.project"
]
},
{
"cell_type": "markdown",
"id": "979ef0d9",
"metadata": {},
"source": [
"## 1 · Volumen bruto\n",
"\n",
"Conteo de filas de cada tabla con sus distintos filtros."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c39eb1cd",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" tabla | \n",
" n | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" NAV sales_price (con deleted) | \n",
" 1626433 | \n",
"
\n",
" \n",
" | 1 | \n",
" NAV sales_price (vivos) | \n",
" 1626331 | \n",
"
\n",
" \n",
" | 2 | \n",
" TPV sale_prices (active=TRUE) | \n",
" 998620 | \n",
"
\n",
" \n",
" | 3 | \n",
" TPV sale_prices (todas) | \n",
" 85683989 | \n",
"
\n",
" \n",
" | 4 | \n",
" products (total) | \n",
" 2747387 | \n",
"
\n",
" \n",
" | 5 | \n",
" products (con nav_id) | \n",
" 2746630 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" tabla n\n",
"0 NAV sales_price (con deleted) 1626433\n",
"1 NAV sales_price (vivos) 1626331\n",
"2 TPV sale_prices (active=TRUE) 998620\n",
"3 TPV sale_prices (todas) 85683989\n",
"4 products (total) 2747387\n",
"5 products (con nav_id) 2746630"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q(f\"\"\"\n",
"SELECT 'NAV sales_price (vivos)' AS tabla, COUNT(*) AS n FROM {NAV} WHERE _fivetran_deleted IS NOT TRUE\n",
"UNION ALL SELECT 'NAV sales_price (con deleted)', COUNT(*) FROM {NAV}\n",
"UNION ALL SELECT 'TPV sale_prices (todas)', COUNT(*) FROM {TPV}\n",
"UNION ALL SELECT 'TPV sale_prices (active=TRUE)', COUNT(*) FROM {TPV} WHERE active\n",
"UNION ALL SELECT 'products (con nav_id)', COUNT(*) FROM {PRODUCTS} WHERE nav_id IS NOT NULL AND nav_id != ''\n",
"UNION ALL SELECT 'products (total)', COUNT(*) FROM {PRODUCTS}\n",
"\"\"\")"
]
},
{
"cell_type": "markdown",
"id": "0973c7c0",
"metadata": {},
"source": [
"**Lectura de los conteos:**\n",
"- TPV guarda histórico completo de precios; NAV reemplaza en sitio.\n",
"- Por eso TPV tiene ~86× más filas que NAV, pero sólo ~1M están activas y representan el estado vigente (comparable con los 1.6M de NAV).\n",
"- Casi todos los `products` tienen `nav_id` (≥99.9%) — el puente funciona.\n",
"- No son comparables fila-a-fila: hay que filtrar a lo vigente."
]
},
{
"cell_type": "markdown",
"id": "0d5e6eae",
"metadata": {},
"source": [
"## 2 · Granularidad NAV\n",
"\n",
"NAV usa `sales_type` + `sales_code` para distinguir listas de precio. Veamos cuántos registros hay por cada combinación:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "da84c08d",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sales_type | \n",
" sales_code | \n",
" n | \n",
" n_items | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 0 | \n",
" 471 | \n",
" 9 | \n",
" 9 | \n",
"
\n",
" \n",
" | 1 | \n",
" 1 | \n",
" 471 | \n",
" 666483 | \n",
" 142054 | \n",
"
\n",
" \n",
" | 2 | \n",
" 1 | \n",
" 472 | \n",
" 283833 | \n",
" 62748 | \n",
"
\n",
" \n",
" | 3 | \n",
" 1 | \n",
" 473 | \n",
" 257973 | \n",
" 53527 | \n",
"
\n",
" \n",
" | 4 | \n",
" 1 | \n",
" 474 | \n",
" 169764 | \n",
" 33526 | \n",
"
\n",
" \n",
" | 5 | \n",
" 1 | \n",
" 475 | \n",
" 166903 | \n",
" 33505 | \n",
"
\n",
" \n",
" | 6 | \n",
" 1 | \n",
" 478 | \n",
" 80772 | \n",
" 46464 | \n",
"
\n",
" \n",
" | 7 | \n",
" 1 | \n",
" 477 | \n",
" 500 | \n",
" 500 | \n",
"
\n",
" \n",
" | 8 | \n",
" 1 | \n",
" 476 | \n",
" 91 | \n",
" 43 | \n",
"
\n",
" \n",
" | 9 | \n",
" 2 | \n",
" | \n",
" 3 | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sales_type sales_code n n_items\n",
"0 0 471 9 9\n",
"1 1 471 666483 142054\n",
"2 1 472 283833 62748\n",
"3 1 473 257973 53527\n",
"4 1 474 169764 33526\n",
"5 1 475 166903 33505\n",
"6 1 478 80772 46464\n",
"7 1 477 500 500\n",
"8 1 476 91 43\n",
"9 2 3 3"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q(f\"\"\"\n",
"SELECT sales_type, sales_code, COUNT(*) AS n, COUNT(DISTINCT item_no_) AS n_items\n",
"FROM {NAV}\n",
"WHERE _fivetran_deleted IS NOT TRUE\n",
"GROUP BY sales_type, sales_code\n",
"ORDER BY sales_type, n DESC\n",
"\"\"\")"
]
},
{
"cell_type": "markdown",
"id": "abc5d386",
"metadata": {},
"source": [
"**Lectura:**\n",
"- `sales_type=1` (Customer Price Group) = 99.99% de NAV, con 8 códigos (471–478).\n",
"- `sales_type=0` y `sales_type=2` son residuales (12 filas totales). Se ignoran."
]
},
{
"cell_type": "markdown",
"id": "38e9a417",
"metadata": {},
"source": [
"## 3 · Granularidad TPV\n",
"\n",
"TPV separa por `channel_id`. Contamos precios activos por canal:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1fe46b8d",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" channel_id | \n",
" name | \n",
" n_prices_active | \n",
" n_products | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 1 | \n",
" aurgi.com | \n",
" 296357 | \n",
" 296212 | \n",
"
\n",
" \n",
" | 1 | \n",
" 3 | \n",
" autingo.es | \n",
" 296141 | \n",
" 296141 | \n",
"
\n",
" \n",
" | 2 | \n",
" 13 | \n",
" Miravia | \n",
" 164814 | \n",
" 164814 | \n",
"
\n",
" \n",
" | 3 | \n",
" 2 | \n",
" motortown.es | \n",
" 59612 | \n",
" 59612 | \n",
"
\n",
" \n",
" | 4 | \n",
" 12 | \n",
" canarias | \n",
" 45132 | \n",
" 45130 | \n",
"
\n",
" \n",
" | 5 | \n",
" 11 | \n",
" Amazon | \n",
" 44739 | \n",
" 44739 | \n",
"
\n",
" \n",
" | 6 | \n",
" 6 | \n",
" Eci Marketplace | \n",
" 39987 | \n",
" 39987 | \n",
"
\n",
" \n",
" | 7 | \n",
" 5 | \n",
" Centros MT | \n",
" 30544 | \n",
" 30544 | \n",
"
\n",
" \n",
" | 8 | \n",
" 10 | \n",
" Talleres Digitales | \n",
" 21294 | \n",
" 21294 | \n",
"
\n",
" \n",
" | 9 | \n",
" 15 | \n",
" Tiktok Shop | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 10 | \n",
" 7 | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 11 | \n",
" 4 | \n",
" Centros Aurgi | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 12 | \n",
" 14 | \n",
" Aliexpress | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
" | 13 | \n",
" 8 | \n",
" Eci Marketplace | \n",
" 0 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" channel_id name n_prices_active n_products\n",
"0 1 aurgi.com 296357 296212\n",
"1 3 autingo.es 296141 296141\n",
"2 13 Miravia 164814 164814\n",
"3 2 motortown.es 59612 59612\n",
"4 12 canarias 45132 45130\n",
"5 11 Amazon 44739 44739\n",
"6 6 Eci Marketplace 39987 39987\n",
"7 5 Centros MT 30544 30544\n",
"8 10 Talleres Digitales 21294 21294\n",
"9 15 Tiktok Shop 0 0\n",
"10 7 NaN 0 0\n",
"11 4 Centros Aurgi 0 0\n",
"12 14 Aliexpress 0 0\n",
"13 8 Eci Marketplace 0 0"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q(f\"\"\"\n",
"SELECT c.id AS channel_id, c.name,\n",
" COUNT(sp.id) AS n_prices_active,\n",
" COUNT(DISTINCT sp.product_id) AS n_products\n",
"FROM {CHANNELS} c\n",
"LEFT JOIN {TPV} sp ON sp.channel_id = c.id AND sp.active = TRUE\n",
"GROUP BY c.id, c.name\n",
"ORDER BY n_prices_active DESC\n",
"\"\"\")"
]
},
{
"cell_type": "markdown",
"id": "7bf15cf0",
"metadata": {},
"source": [
"## 4 · Deducir el mapeo `NAV.sales_code` ↔ `TPV.channel_id`\n",
"\n",
"No existe tabla de equivalencia. Lo deducimos por **coincidencia empírica de precio**: para cada par `(sales_code, channel_id)` contamos cuántos productos tienen exactamente el mismo `unit_price` vigente."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "2654f253",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sales_code | \n",
" channel_id | \n",
" n_match_prod | \n",
" n_same_price | \n",
" pct_same | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 471 | \n",
" 1 | \n",
" 103075 | \n",
" 99553 | \n",
" 96.6 | \n",
"
\n",
" \n",
" | 1 | \n",
" 472 | \n",
" 2 | \n",
" 57059 | \n",
" 57059 | \n",
" 100.0 | \n",
"
\n",
" \n",
" | 2 | \n",
" 478 | \n",
" 12 | \n",
" 41717 | \n",
" 41713 | \n",
" 100.0 | \n",
"
\n",
" \n",
" | 3 | \n",
" 472 | \n",
" 12 | \n",
" 41651 | \n",
" 40085 | \n",
" 96.2 | \n",
"
\n",
" \n",
" | 4 | \n",
" 478 | \n",
" 2 | \n",
" 41651 | \n",
" 40085 | \n",
" 96.2 | \n",
"
\n",
" \n",
" | 5 | \n",
" 474 | \n",
" 2 | \n",
" 30041 | \n",
" 24172 | \n",
" 80.5 | \n",
"
\n",
" \n",
" | 6 | \n",
" 475 | \n",
" 2 | \n",
" 30042 | \n",
" 24165 | \n",
" 80.4 | \n",
"
\n",
" \n",
" | 7 | \n",
" 473 | \n",
" 2 | \n",
" 48940 | \n",
" 17529 | \n",
" 35.8 | \n",
"
\n",
" \n",
" | 8 | \n",
" 475 | \n",
" 12 | \n",
" 20936 | \n",
" 15554 | \n",
" 74.3 | \n",
"
\n",
" \n",
" | 9 | \n",
" 474 | \n",
" 12 | \n",
" 20934 | \n",
" 15553 | \n",
" 74.3 | \n",
"
\n",
" \n",
" | 10 | \n",
" 471 | \n",
" 10 | \n",
" 15013 | \n",
" 12912 | \n",
" 86.0 | \n",
"
\n",
" \n",
" | 11 | \n",
" 473 | \n",
" 12 | \n",
" 34595 | \n",
" 9240 | \n",
" 26.7 | \n",
"
\n",
" \n",
" | 12 | \n",
" 471 | \n",
" 2 | \n",
" 47405 | \n",
" 4385 | \n",
" 9.3 | \n",
"
\n",
" \n",
" | 13 | \n",
" 472 | \n",
" 1 | \n",
" 47427 | \n",
" 4376 | \n",
" 9.2 | \n",
"
\n",
" \n",
" | 14 | \n",
" 471 | \n",
" 12 | \n",
" 34621 | \n",
" 3828 | \n",
" 11.1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sales_code channel_id n_match_prod n_same_price pct_same\n",
"0 471 1 103075 99553 96.6\n",
"1 472 2 57059 57059 100.0\n",
"2 478 12 41717 41713 100.0\n",
"3 472 12 41651 40085 96.2\n",
"4 478 2 41651 40085 96.2\n",
"5 474 2 30041 24172 80.5\n",
"6 475 2 30042 24165 80.4\n",
"7 473 2 48940 17529 35.8\n",
"8 475 12 20936 15554 74.3\n",
"9 474 12 20934 15553 74.3\n",
"10 471 10 15013 12912 86.0\n",
"11 473 12 34595 9240 26.7\n",
"12 471 2 47405 4385 9.3\n",
"13 472 1 47427 4376 9.2\n",
"14 471 12 34621 3828 11.1"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q(f\"\"\"\n",
"WITH nav AS (\n",
" SELECT p.id AS product_id, sp.sales_code,\n",
" CAST(sp.unit_price AS FLOAT64) AS nav_price\n",
" FROM {NAV} sp\n",
" JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n",
" WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1\n",
" AND sp.starting_date <= CURRENT_DATETIME()\n",
" AND (sp.ending_date IS NULL\n",
" OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n",
" OR sp.ending_date >= CURRENT_DATETIME())\n",
" AND sp.variant_code = ''\n",
"),\n",
"tpv AS (\n",
" SELECT product_id, channel_id, unit_price AS tpv_price\n",
" FROM {TPV}\n",
" WHERE active = TRUE\n",
" AND starting_date <= CURRENT_DATE()\n",
" AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n",
")\n",
"SELECT nav.sales_code, tpv.channel_id,\n",
" COUNT(*) AS n_match_prod,\n",
" SUM(CASE WHEN ABS(nav.nav_price - tpv.tpv_price) < 0.01 THEN 1 ELSE 0 END) AS n_same_price,\n",
" ROUND(SUM(CASE WHEN ABS(nav.nav_price - tpv.tpv_price) < 0.01 THEN 1 ELSE 0 END) / COUNT(*) * 100, 1) AS pct_same\n",
"FROM nav JOIN tpv USING(product_id)\n",
"GROUP BY nav.sales_code, tpv.channel_id\n",
"HAVING n_match_prod > 100\n",
"ORDER BY n_same_price DESC\n",
"LIMIT 15\n",
"\"\"\")"
]
},
{
"cell_type": "markdown",
"id": "c6d538cf",
"metadata": {},
"source": [
"### Mapeos deducidos (alta coincidencia)\n",
"\n",
"| NAV sales_code | TPV channel_id | Canal TPV | % coincidencia |\n",
"|---:|---:|---|---:|\n",
"| 471 | 1 | aurgi.com | ~96% |\n",
"| 472 | 2 | motortown.es | 100% |\n",
"| 478 | 12 | canarias | ~100% |\n",
"\n",
"Los demás `sales_code` (473, 474, 475, 476, 477) **no tienen canal TPV equivalente directo**. Son probables price-groups internos (B2B, flotas) que no se exponen en canales TPV."
]
},
{
"cell_type": "markdown",
"id": "3cceab83",
"metadata": {},
"source": [
"## 5 · Comparación detallada de los 3 mapeos fiables\n",
"\n",
"Para cada mapeo calculamos: filas vigentes en cada lado, intersección (`both`), diferencias simétricas (`only_nav`, `only_tpv`), coincidencia exacta de precio y estadísticos de la diferencia absoluta."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "c83090f0",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" mapping | \n",
" nav_rows | \n",
" tpv_rows | \n",
" both | \n",
" only_nav | \n",
" only_tpv | \n",
" same_price | \n",
" mean_diff | \n",
" p50 | \n",
" p95 | \n",
" max_diff | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 471 → ch1 | \n",
" 121631 | \n",
" 146654 | \n",
" 103075 | \n",
" 18559 | \n",
" 43581 | \n",
" 99553 | \n",
" 0.3 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 1476.27 | \n",
"
\n",
" \n",
" | 1 | \n",
" 472 → ch2 | \n",
" 62644 | \n",
" 57070 | \n",
" 57059 | \n",
" 5585 | \n",
" 11 | \n",
" 57059 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.00 | \n",
"
\n",
" \n",
" | 2 | \n",
" 478 → ch12 | \n",
" 46435 | \n",
" 41722 | \n",
" 41717 | \n",
" 4722 | \n",
" 9 | \n",
" 41713 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 0.0 | \n",
" 27.51 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" mapping nav_rows tpv_rows both only_nav only_tpv same_price \\\n",
"0 471 → ch1 121631 146654 103075 18559 43581 99553 \n",
"1 472 → ch2 62644 57070 57059 5585 11 57059 \n",
"2 478 → ch12 46435 41722 41717 4722 9 41713 \n",
"\n",
" mean_diff p50 p95 max_diff \n",
"0 0.3 0.0 0.0 1476.27 \n",
"1 0.0 0.0 0.0 0.00 \n",
"2 0.0 0.0 0.0 27.51 "
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def compare(sales_code: str, channel_id: int) -> pd.DataFrame:\n",
" sql = f\"\"\"\n",
" WITH nav AS (\n",
" SELECT p.id AS product_id, CAST(sp.unit_price AS FLOAT64) AS nav_price\n",
" FROM {NAV} sp JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n",
" WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1 AND sp.sales_code = '{sales_code}'\n",
" AND sp.starting_date <= CURRENT_DATETIME()\n",
" AND (sp.ending_date IS NULL\n",
" OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n",
" OR sp.ending_date >= CURRENT_DATETIME())\n",
" AND sp.variant_code = ''\n",
" ),\n",
" tpv AS (\n",
" SELECT product_id, unit_price AS tpv_price FROM {TPV}\n",
" WHERE channel_id = {channel_id} AND active = TRUE\n",
" AND starting_date <= CURRENT_DATE()\n",
" AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n",
" )\n",
" SELECT\n",
" '{sales_code} \\u2192 ch{channel_id}' AS mapping,\n",
" (SELECT COUNT(*) FROM nav) AS nav_rows,\n",
" (SELECT COUNT(*) FROM tpv) AS tpv_rows,\n",
" (SELECT COUNT(*) FROM nav JOIN tpv USING(product_id)) AS both,\n",
" (SELECT COUNT(*) FROM nav LEFT JOIN tpv USING(product_id) WHERE tpv.product_id IS NULL) AS only_nav,\n",
" (SELECT COUNT(*) FROM tpv LEFT JOIN nav USING(product_id) WHERE nav.product_id IS NULL) AS only_tpv,\n",
" (SELECT COUNT(*) FROM nav JOIN tpv USING(product_id) WHERE ABS(nav_price - tpv_price) < 0.01) AS same_price,\n",
" (SELECT ROUND(AVG(ABS(nav_price - tpv_price)), 2) FROM nav JOIN tpv USING(product_id)) AS mean_diff,\n",
" (SELECT ROUND(APPROX_QUANTILES(ABS(nav_price - tpv_price), 100)[OFFSET(50)], 2) FROM nav JOIN tpv USING(product_id)) AS p50,\n",
" (SELECT ROUND(APPROX_QUANTILES(ABS(nav_price - tpv_price), 100)[OFFSET(95)], 2) FROM nav JOIN tpv USING(product_id)) AS p95,\n",
" (SELECT ROUND(MAX(ABS(nav_price - tpv_price)), 2) FROM nav JOIN tpv USING(product_id)) AS max_diff\n",
" \"\"\"\n",
" return q(sql)\n",
"\n",
"comparison = pd.concat([\n",
" compare(\"471\", 1), # aurgi.com\n",
" compare(\"472\", 2), # motortown.es\n",
" compare(\"478\", 12), # canarias\n",
"], ignore_index=True)\n",
"comparison"
]
},
{
"cell_type": "markdown",
"id": "06c2882f",
"metadata": {},
"source": [
"## 6 · Outliers de 471 → aurgi.com\n",
"\n",
"En 471→1 el ~3.4% de productos no coincide. Inspeccionamos los mayores outliers para ver si es ruido puntual o algo sistemático."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "37454278",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" nav_id | \n",
" normalized_ref | \n",
" nav_price | \n",
" tpv_price | \n",
" diff_eur | \n",
" nav_updated | \n",
" tpv_updated | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" MKT539258 | \n",
" NaN | \n",
" 79.17 | \n",
" 1555.44 | \n",
" 1476.27 | \n",
" 2023-12-20 | \n",
" 2026-01-27 18:24:16.236286+00:00 | \n",
"
\n",
" \n",
" | 1 | \n",
" MKT263951 | \n",
" NaN | \n",
" 612.07 | \n",
" 945.47 | \n",
" 333.40 | \n",
" 2025-10-15 | \n",
" 2026-04-11 03:42:31.985658+00:00 | \n",
"
\n",
" \n",
" | 2 | \n",
" MKT528522558 | \n",
" NaN | \n",
" 801.86 | \n",
" 502.90 | \n",
" -298.96 | \n",
" 2026-03-08 | \n",
" 2026-04-15 09:03:06.036752+00:00 | \n",
"
\n",
" \n",
" | 3 | \n",
" MKT1285247190 | \n",
" NaN | \n",
" 39.49 | \n",
" 280.10 | \n",
" 240.61 | \n",
" 2025-10-07 | \n",
" 2026-03-24 12:16:47.114948+00:00 | \n",
"
\n",
" \n",
" | 4 | \n",
" MKT248783 | \n",
" NaN | \n",
" 699.99 | \n",
" 923.83 | \n",
" 223.84 | \n",
" 2023-12-20 | \n",
" 2026-04-14 07:23:17.228074+00:00 | \n",
"
\n",
" \n",
" | 5 | \n",
" MKT235345 | \n",
" NaN | \n",
" 728.97 | \n",
" 517.64 | \n",
" -211.33 | \n",
" 2023-12-20 | \n",
" 2026-04-10 10:24:57.693899+00:00 | \n",
"
\n",
" \n",
" | 6 | \n",
" MKT1102764312 | \n",
" NaN | \n",
" 374.71 | \n",
" 189.00 | \n",
" -185.71 | \n",
" 2025-05-04 | \n",
" 2026-04-14 14:05:20.943072+00:00 | \n",
"
\n",
" \n",
" | 7 | \n",
" MKT312845 | \n",
" NaN | \n",
" 996.37 | \n",
" 826.18 | \n",
" -170.19 | \n",
" 2023-12-20 | \n",
" 2026-04-09 12:43:19.657585+00:00 | \n",
"
\n",
" \n",
" | 8 | \n",
" MKT1009692625 | \n",
" NaN | \n",
" 30.48 | \n",
" 192.72 | \n",
" 162.24 | \n",
" 2025-10-09 | \n",
" 2026-04-14 13:24:10.117913+00:00 | \n",
"
\n",
" \n",
" | 9 | \n",
" MKT1102764641 | \n",
" NaN | \n",
" 473.90 | \n",
" 629.00 | \n",
" 155.10 | \n",
" 2025-11-07 | \n",
" 2026-04-10 09:22:42.013122+00:00 | \n",
"
\n",
" \n",
" | 10 | \n",
" MKT235222 | \n",
" NaN | \n",
" 1155.85 | \n",
" 1013.29 | \n",
" -142.56 | \n",
" 2023-12-20 | \n",
" 2026-04-01 17:44:05.939763+00:00 | \n",
"
\n",
" \n",
" | 11 | \n",
" MKT119817 | \n",
" NaN | \n",
" 97.80 | \n",
" 227.93 | \n",
" 130.13 | \n",
" 2023-12-20 | \n",
" 2026-04-13 18:27:22.018325+00:00 | \n",
"
\n",
" \n",
" | 12 | \n",
" MKT1102764639 | \n",
" NaN | \n",
" 449.00 | \n",
" 579.00 | \n",
" 130.00 | \n",
" 2025-11-28 | \n",
" 2026-04-15 09:03:04.974830+00:00 | \n",
"
\n",
" \n",
" | 13 | \n",
" MKT119351 | \n",
" NaN | \n",
" 0.00 | \n",
" 125.90 | \n",
" 125.90 | \n",
" 2023-12-20 | \n",
" 2026-04-13 06:22:44.921118+00:00 | \n",
"
\n",
" \n",
" | 14 | \n",
" MKT1102764780 | \n",
" NaN | \n",
" 453.22 | \n",
" 327.90 | \n",
" -125.32 | \n",
" 2025-07-08 | \n",
" 2026-04-14 11:24:44.704392+00:00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" nav_id normalized_ref nav_price tpv_price diff_eur nav_updated \\\n",
"0 MKT539258 NaN 79.17 1555.44 1476.27 2023-12-20 \n",
"1 MKT263951 NaN 612.07 945.47 333.40 2025-10-15 \n",
"2 MKT528522558 NaN 801.86 502.90 -298.96 2026-03-08 \n",
"3 MKT1285247190 NaN 39.49 280.10 240.61 2025-10-07 \n",
"4 MKT248783 NaN 699.99 923.83 223.84 2023-12-20 \n",
"5 MKT235345 NaN 728.97 517.64 -211.33 2023-12-20 \n",
"6 MKT1102764312 NaN 374.71 189.00 -185.71 2025-05-04 \n",
"7 MKT312845 NaN 996.37 826.18 -170.19 2023-12-20 \n",
"8 MKT1009692625 NaN 30.48 192.72 162.24 2025-10-09 \n",
"9 MKT1102764641 NaN 473.90 629.00 155.10 2025-11-07 \n",
"10 MKT235222 NaN 1155.85 1013.29 -142.56 2023-12-20 \n",
"11 MKT119817 NaN 97.80 227.93 130.13 2023-12-20 \n",
"12 MKT1102764639 NaN 449.00 579.00 130.00 2025-11-28 \n",
"13 MKT119351 NaN 0.00 125.90 125.90 2023-12-20 \n",
"14 MKT1102764780 NaN 453.22 327.90 -125.32 2025-07-08 \n",
"\n",
" tpv_updated \n",
"0 2026-01-27 18:24:16.236286+00:00 \n",
"1 2026-04-11 03:42:31.985658+00:00 \n",
"2 2026-04-15 09:03:06.036752+00:00 \n",
"3 2026-03-24 12:16:47.114948+00:00 \n",
"4 2026-04-14 07:23:17.228074+00:00 \n",
"5 2026-04-10 10:24:57.693899+00:00 \n",
"6 2026-04-14 14:05:20.943072+00:00 \n",
"7 2026-04-09 12:43:19.657585+00:00 \n",
"8 2026-04-14 13:24:10.117913+00:00 \n",
"9 2026-04-10 09:22:42.013122+00:00 \n",
"10 2026-04-01 17:44:05.939763+00:00 \n",
"11 2026-04-13 18:27:22.018325+00:00 \n",
"12 2026-04-15 09:03:04.974830+00:00 \n",
"13 2026-04-13 06:22:44.921118+00:00 \n",
"14 2026-04-14 11:24:44.704392+00:00 "
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"q(f\"\"\"\n",
"WITH nav AS (\n",
" SELECT p.id AS product_id, p.nav_id, p.normalized_ref,\n",
" CAST(sp.unit_price AS FLOAT64) AS nav_price,\n",
" sp.fecha_ultima_modificacion AS nav_updated\n",
" FROM {NAV} sp JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n",
" WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1 AND sp.sales_code = '471'\n",
" AND sp.starting_date <= CURRENT_DATETIME()\n",
" AND (sp.ending_date IS NULL OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n",
" OR sp.ending_date >= CURRENT_DATETIME())\n",
" AND sp.variant_code = ''\n",
"),\n",
"tpv AS (\n",
" SELECT product_id, unit_price AS tpv_price, updated_at AS tpv_updated\n",
" FROM {TPV}\n",
" WHERE channel_id = 1 AND active = TRUE\n",
" AND starting_date <= CURRENT_DATE()\n",
" AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n",
")\n",
"SELECT nav.nav_id, nav.normalized_ref,\n",
" ROUND(nav.nav_price, 2) AS nav_price,\n",
" ROUND(tpv.tpv_price, 2) AS tpv_price,\n",
" ROUND(tpv.tpv_price - nav.nav_price, 2) AS diff_eur,\n",
" nav.nav_updated, tpv.tpv_updated\n",
"FROM nav JOIN tpv USING(product_id)\n",
"WHERE ABS(nav.nav_price - tpv.tpv_price) >= 0.01\n",
"ORDER BY ABS(nav.nav_price - tpv.tpv_price) DESC\n",
"LIMIT 15\n",
"\"\"\")"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "e9bc35d5",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
" warnings.warn(\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" lt_10cent | \n",
" c10_to_1eur | \n",
" c1_to_10 | \n",
" c10_to_100 | \n",
" ge_100 | \n",
" total_diff | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" 119 | \n",
" 335 | \n",
" 2273 | \n",
" 773 | \n",
" 22 | \n",
" 3522 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" lt_10cent c10_to_1eur c1_to_10 c10_to_100 ge_100 total_diff\n",
"0 119 335 2273 773 22 3522"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Distribucion de la magnitud de los diffs en 471 -> aurgi.com\n",
"q(f\"\"\"\n",
"WITH nav AS (\n",
" SELECT p.id AS product_id, CAST(sp.unit_price AS FLOAT64) AS nav_price\n",
" FROM {NAV} sp JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n",
" WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1 AND sp.sales_code = '471'\n",
" AND sp.starting_date <= CURRENT_DATETIME()\n",
" AND (sp.ending_date IS NULL OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n",
" OR sp.ending_date >= CURRENT_DATETIME())\n",
" AND sp.variant_code = ''\n",
"),\n",
"tpv AS (\n",
" SELECT product_id, unit_price AS tpv_price FROM {TPV}\n",
" WHERE channel_id = 1 AND active = TRUE\n",
" AND starting_date <= CURRENT_DATE()\n",
" AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n",
"),\n",
"diffs AS (\n",
" SELECT ABS(nav.nav_price - tpv.tpv_price) AS d\n",
" FROM nav JOIN tpv USING(product_id)\n",
" WHERE ABS(nav.nav_price - tpv.tpv_price) >= 0.01\n",
")\n",
"SELECT\n",
" COUNTIF(d < 0.10) AS lt_10cent,\n",
" COUNTIF(d >= 0.10 AND d < 1.00) AS c10_to_1eur,\n",
" COUNTIF(d >= 1.00 AND d < 10.00) AS c1_to_10,\n",
" COUNTIF(d >= 10.00 AND d < 100.00) AS c10_to_100,\n",
" COUNTIF(d >= 100.00) AS ge_100,\n",
" COUNT(*) AS total_diff\n",
"FROM diffs\n",
"\"\"\")"
]
},
{
"cell_type": "markdown",
"id": "ee00a226",
"metadata": {},
"source": [
"## 7 · Conclusión\n",
"\n",
"### ¿Son las mismas tablas?\n",
"\n",
"**No exactamente, pero coinciden casi perfectamente donde se puede cruzar.**\n",
"\n",
"1. **Granularidad distinta.** NAV operacional (1.6M, un registro por precio vigente); TPV histórica (85M totales, ~1M activas). Comparación sólo tiene sentido filtrando a lo vigente en ambas.\n",
"\n",
"2. **Mapeo no documentado.** NAV usa `sales_type=1` + `sales_code ∈ {471..478}`; TPV usa `channel_id`. Deducidos empíricamente:\n",
" - **471 → aurgi.com (1)** — ~96% coincidencia\n",
" - **472 → motortown.es (2)** — 100% coincidencia\n",
" - **478 → canarias (12)** — ~100% coincidencia\n",
" - 473–477 → sin canal TPV (probable B2B/flotas/internos)\n",
"\n",
"3. **Cobertura:**\n",
" - 472 y 478 son réplicas directas (NAV → TPV al céntimo, ~11 extras en TPV y ~5k ausentes en TPV).\n",
" - 471 ↔ aurgi.com tiene desalineación: 43,581 productos sólo en TPV (catálogo e-commerce propio), 18,559 sólo en NAV, y ~3,500 con precio distinto.\n",
"\n",
"4. **Riesgo financiero:** canales centro (motortown/canarias) = **cero riesgo**. aurgi.com necesita proceso de reconciliación si NAV se considera verdad contable.\n",
"\n",
"### Siguientes pasos sugeridos\n",
"\n",
"- Pedir a negocio qué representan `sales_code` 473–477 y si deben tener canal TPV.\n",
"- Auditar outliers > 10 € en 471→1 (celdas anteriores).\n",
"- Decidir fuente de verdad por canal en el proyecto migración NAV↔TPV."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}