1408 lines
48 KiB
Plaintext
1408 lines
48 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "38a86e03-762e-49e0-8794-407c39e78fc3",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"# (celda inicial generada por jupyter_write create — se ignora)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "02637343",
|
||
"metadata": {},
|
||
"source": [
|
||
"# Comparación precios de venta NAV vs TPV\n",
|
||
"\n",
|
||
"**Fuentes:**\n",
|
||
"- `autingo-159109.mssql2022_dbo.sales_price` — NAV (Navision)\n",
|
||
"- `autingo-159109.psql_dcpublic.sale_prices` — TPV (Datacentric/PostgreSQL)\n",
|
||
"\n",
|
||
"**Puente:** `autingo-159109.psql_dcpublic.products.nav_id` ↔ `sales_price.item_no_`\n",
|
||
"\n",
|
||
"**Objetivo:** determinar si las dos tablas representan los mismos precios."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "f586f3bb",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 0 · Setup BigQuery\n",
|
||
"\n",
|
||
"Con ADC de end-user no tenemos `serviceusage.services.use` en `autingo-159109`. Forzamos `quota_project_id=None` para que BQ cobre al proyecto de la query. Sin esto algunas queries fallan con `403 USER_PROJECT_DENIED`."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 2,
|
||
"id": "bbace39d",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"text/plain": [
|
||
"'autingo-159109'"
|
||
]
|
||
},
|
||
"execution_count": 2,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"import google.auth\n",
|
||
"from google.cloud import bigquery\n",
|
||
"import pandas as pd\n",
|
||
"\n",
|
||
"PROJECT = \"autingo-159109\"\n",
|
||
"NAV = f\"`{PROJECT}.mssql2022_dbo.sales_price`\"\n",
|
||
"TPV = f\"`{PROJECT}.psql_dcpublic.sale_prices`\"\n",
|
||
"PRODUCTS = f\"`{PROJECT}.psql_dcpublic.products`\"\n",
|
||
"CHANNELS = f\"`{PROJECT}.psql_dcpublic.channels`\"\n",
|
||
"\n",
|
||
"creds, _ = google.auth.default()\n",
|
||
"if hasattr(creds, \"with_quota_project\"):\n",
|
||
" creds = creds.with_quota_project(None)\n",
|
||
"bq = bigquery.Client(project=PROJECT, credentials=creds,\n",
|
||
" client_options={\"quota_project_id\": None})\n",
|
||
"\n",
|
||
"def q(sql: str) -> pd.DataFrame:\n",
|
||
" \"\"\"Ejecuta una query y devuelve dataframe. Inmutable.\"\"\"\n",
|
||
" return bq.query(sql).to_dataframe()\n",
|
||
"\n",
|
||
"bq.project"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "979ef0d9",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 1 · Volumen bruto\n",
|
||
"\n",
|
||
"Conteo de filas de cada tabla con sus distintos filtros."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 3,
|
||
"id": "c39eb1cd",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
|
||
" warnings.warn(\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>tabla</th>\n",
|
||
" <th>n</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>NAV sales_price (con deleted)</td>\n",
|
||
" <td>1626433</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>NAV sales_price (vivos)</td>\n",
|
||
" <td>1626331</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>TPV sale_prices (active=TRUE)</td>\n",
|
||
" <td>998620</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>TPV sale_prices (todas)</td>\n",
|
||
" <td>85683989</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>products (total)</td>\n",
|
||
" <td>2747387</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>products (con nav_id)</td>\n",
|
||
" <td>2746630</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" tabla n\n",
|
||
"0 NAV sales_price (con deleted) 1626433\n",
|
||
"1 NAV sales_price (vivos) 1626331\n",
|
||
"2 TPV sale_prices (active=TRUE) 998620\n",
|
||
"3 TPV sale_prices (todas) 85683989\n",
|
||
"4 products (total) 2747387\n",
|
||
"5 products (con nav_id) 2746630"
|
||
]
|
||
},
|
||
"execution_count": 3,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"q(f\"\"\"\n",
|
||
"SELECT 'NAV sales_price (vivos)' AS tabla, COUNT(*) AS n FROM {NAV} WHERE _fivetran_deleted IS NOT TRUE\n",
|
||
"UNION ALL SELECT 'NAV sales_price (con deleted)', COUNT(*) FROM {NAV}\n",
|
||
"UNION ALL SELECT 'TPV sale_prices (todas)', COUNT(*) FROM {TPV}\n",
|
||
"UNION ALL SELECT 'TPV sale_prices (active=TRUE)', COUNT(*) FROM {TPV} WHERE active\n",
|
||
"UNION ALL SELECT 'products (con nav_id)', COUNT(*) FROM {PRODUCTS} WHERE nav_id IS NOT NULL AND nav_id != ''\n",
|
||
"UNION ALL SELECT 'products (total)', COUNT(*) FROM {PRODUCTS}\n",
|
||
"\"\"\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0973c7c0",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Lectura de los conteos:**\n",
|
||
"- TPV guarda histórico completo de precios; NAV reemplaza en sitio.\n",
|
||
"- Por eso TPV tiene ~86× más filas que NAV, pero sólo ~1M están activas y representan el estado vigente (comparable con los 1.6M de NAV).\n",
|
||
"- Casi todos los `products` tienen `nav_id` (≥99.9%) — el puente funciona.\n",
|
||
"- No son comparables fila-a-fila: hay que filtrar a lo vigente."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0d5e6eae",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 2 · Granularidad NAV\n",
|
||
"\n",
|
||
"NAV usa `sales_type` + `sales_code` para distinguir listas de precio. Veamos cuántos registros hay por cada combinación:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 4,
|
||
"id": "da84c08d",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
|
||
" warnings.warn(\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>sales_type</th>\n",
|
||
" <th>sales_code</th>\n",
|
||
" <th>n</th>\n",
|
||
" <th>n_items</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>0</td>\n",
|
||
" <td>471</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>9</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>471</td>\n",
|
||
" <td>666483</td>\n",
|
||
" <td>142054</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>472</td>\n",
|
||
" <td>283833</td>\n",
|
||
" <td>62748</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>473</td>\n",
|
||
" <td>257973</td>\n",
|
||
" <td>53527</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>474</td>\n",
|
||
" <td>169764</td>\n",
|
||
" <td>33526</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>475</td>\n",
|
||
" <td>166903</td>\n",
|
||
" <td>33505</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>478</td>\n",
|
||
" <td>80772</td>\n",
|
||
" <td>46464</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>477</td>\n",
|
||
" <td>500</td>\n",
|
||
" <td>500</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>476</td>\n",
|
||
" <td>91</td>\n",
|
||
" <td>43</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>9</th>\n",
|
||
" <td>2</td>\n",
|
||
" <td></td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>3</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" sales_type sales_code n n_items\n",
|
||
"0 0 471 9 9\n",
|
||
"1 1 471 666483 142054\n",
|
||
"2 1 472 283833 62748\n",
|
||
"3 1 473 257973 53527\n",
|
||
"4 1 474 169764 33526\n",
|
||
"5 1 475 166903 33505\n",
|
||
"6 1 478 80772 46464\n",
|
||
"7 1 477 500 500\n",
|
||
"8 1 476 91 43\n",
|
||
"9 2 3 3"
|
||
]
|
||
},
|
||
"execution_count": 4,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"q(f\"\"\"\n",
|
||
"SELECT sales_type, sales_code, COUNT(*) AS n, COUNT(DISTINCT item_no_) AS n_items\n",
|
||
"FROM {NAV}\n",
|
||
"WHERE _fivetran_deleted IS NOT TRUE\n",
|
||
"GROUP BY sales_type, sales_code\n",
|
||
"ORDER BY sales_type, n DESC\n",
|
||
"\"\"\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "abc5d386",
|
||
"metadata": {},
|
||
"source": [
|
||
"**Lectura:**\n",
|
||
"- `sales_type=1` (Customer Price Group) = 99.99% de NAV, con 8 códigos (471–478).\n",
|
||
"- `sales_type=0` y `sales_type=2` son residuales (12 filas totales). Se ignoran."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "38e9a417",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 3 · Granularidad TPV\n",
|
||
"\n",
|
||
"TPV separa por `channel_id`. Contamos precios activos por canal:"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 5,
|
||
"id": "1fe46b8d",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
|
||
" warnings.warn(\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>channel_id</th>\n",
|
||
" <th>name</th>\n",
|
||
" <th>n_prices_active</th>\n",
|
||
" <th>n_products</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>1</td>\n",
|
||
" <td>aurgi.com</td>\n",
|
||
" <td>296357</td>\n",
|
||
" <td>296212</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>3</td>\n",
|
||
" <td>autingo.es</td>\n",
|
||
" <td>296141</td>\n",
|
||
" <td>296141</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>13</td>\n",
|
||
" <td>Miravia</td>\n",
|
||
" <td>164814</td>\n",
|
||
" <td>164814</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>2</td>\n",
|
||
" <td>motortown.es</td>\n",
|
||
" <td>59612</td>\n",
|
||
" <td>59612</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>12</td>\n",
|
||
" <td>canarias</td>\n",
|
||
" <td>45132</td>\n",
|
||
" <td>45130</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>11</td>\n",
|
||
" <td>Amazon</td>\n",
|
||
" <td>44739</td>\n",
|
||
" <td>44739</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6</th>\n",
|
||
" <td>6</td>\n",
|
||
" <td>Eci Marketplace</td>\n",
|
||
" <td>39987</td>\n",
|
||
" <td>39987</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>5</td>\n",
|
||
" <td>Centros MT</td>\n",
|
||
" <td>30544</td>\n",
|
||
" <td>30544</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>10</td>\n",
|
||
" <td>Talleres Digitales</td>\n",
|
||
" <td>21294</td>\n",
|
||
" <td>21294</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>9</th>\n",
|
||
" <td>15</td>\n",
|
||
" <td>Tiktok Shop</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>7</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>11</th>\n",
|
||
" <td>4</td>\n",
|
||
" <td>Centros Aurgi</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>14</td>\n",
|
||
" <td>Aliexpress</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>13</th>\n",
|
||
" <td>8</td>\n",
|
||
" <td>Eci Marketplace</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" channel_id name n_prices_active n_products\n",
|
||
"0 1 aurgi.com 296357 296212\n",
|
||
"1 3 autingo.es 296141 296141\n",
|
||
"2 13 Miravia 164814 164814\n",
|
||
"3 2 motortown.es 59612 59612\n",
|
||
"4 12 canarias 45132 45130\n",
|
||
"5 11 Amazon 44739 44739\n",
|
||
"6 6 Eci Marketplace 39987 39987\n",
|
||
"7 5 Centros MT 30544 30544\n",
|
||
"8 10 Talleres Digitales 21294 21294\n",
|
||
"9 15 Tiktok Shop 0 0\n",
|
||
"10 7 NaN 0 0\n",
|
||
"11 4 Centros Aurgi 0 0\n",
|
||
"12 14 Aliexpress 0 0\n",
|
||
"13 8 Eci Marketplace 0 0"
|
||
]
|
||
},
|
||
"execution_count": 5,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"q(f\"\"\"\n",
|
||
"SELECT c.id AS channel_id, c.name,\n",
|
||
" COUNT(sp.id) AS n_prices_active,\n",
|
||
" COUNT(DISTINCT sp.product_id) AS n_products\n",
|
||
"FROM {CHANNELS} c\n",
|
||
"LEFT JOIN {TPV} sp ON sp.channel_id = c.id AND sp.active = TRUE\n",
|
||
"GROUP BY c.id, c.name\n",
|
||
"ORDER BY n_prices_active DESC\n",
|
||
"\"\"\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "7bf15cf0",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 4 · Deducir el mapeo `NAV.sales_code` ↔ `TPV.channel_id`\n",
|
||
"\n",
|
||
"No existe tabla de equivalencia. Lo deducimos por **coincidencia empírica de precio**: para cada par `(sales_code, channel_id)` contamos cuántos productos tienen exactamente el mismo `unit_price` vigente."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 6,
|
||
"id": "2654f253",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
|
||
" warnings.warn(\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>sales_code</th>\n",
|
||
" <th>channel_id</th>\n",
|
||
" <th>n_match_prod</th>\n",
|
||
" <th>n_same_price</th>\n",
|
||
" <th>pct_same</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>471</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>103075</td>\n",
|
||
" <td>99553</td>\n",
|
||
" <td>96.6</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>472</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>57059</td>\n",
|
||
" <td>57059</td>\n",
|
||
" <td>100.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>478</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>41717</td>\n",
|
||
" <td>41713</td>\n",
|
||
" <td>100.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>472</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>41651</td>\n",
|
||
" <td>40085</td>\n",
|
||
" <td>96.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>478</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>41651</td>\n",
|
||
" <td>40085</td>\n",
|
||
" <td>96.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>474</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>30041</td>\n",
|
||
" <td>24172</td>\n",
|
||
" <td>80.5</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6</th>\n",
|
||
" <td>475</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>30042</td>\n",
|
||
" <td>24165</td>\n",
|
||
" <td>80.4</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>473</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>48940</td>\n",
|
||
" <td>17529</td>\n",
|
||
" <td>35.8</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>475</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>20936</td>\n",
|
||
" <td>15554</td>\n",
|
||
" <td>74.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>9</th>\n",
|
||
" <td>474</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>20934</td>\n",
|
||
" <td>15553</td>\n",
|
||
" <td>74.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>471</td>\n",
|
||
" <td>10</td>\n",
|
||
" <td>15013</td>\n",
|
||
" <td>12912</td>\n",
|
||
" <td>86.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>11</th>\n",
|
||
" <td>473</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>34595</td>\n",
|
||
" <td>9240</td>\n",
|
||
" <td>26.7</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>471</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>47405</td>\n",
|
||
" <td>4385</td>\n",
|
||
" <td>9.3</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>13</th>\n",
|
||
" <td>472</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>47427</td>\n",
|
||
" <td>4376</td>\n",
|
||
" <td>9.2</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>14</th>\n",
|
||
" <td>471</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>34621</td>\n",
|
||
" <td>3828</td>\n",
|
||
" <td>11.1</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" sales_code channel_id n_match_prod n_same_price pct_same\n",
|
||
"0 471 1 103075 99553 96.6\n",
|
||
"1 472 2 57059 57059 100.0\n",
|
||
"2 478 12 41717 41713 100.0\n",
|
||
"3 472 12 41651 40085 96.2\n",
|
||
"4 478 2 41651 40085 96.2\n",
|
||
"5 474 2 30041 24172 80.5\n",
|
||
"6 475 2 30042 24165 80.4\n",
|
||
"7 473 2 48940 17529 35.8\n",
|
||
"8 475 12 20936 15554 74.3\n",
|
||
"9 474 12 20934 15553 74.3\n",
|
||
"10 471 10 15013 12912 86.0\n",
|
||
"11 473 12 34595 9240 26.7\n",
|
||
"12 471 2 47405 4385 9.3\n",
|
||
"13 472 1 47427 4376 9.2\n",
|
||
"14 471 12 34621 3828 11.1"
|
||
]
|
||
},
|
||
"execution_count": 6,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"q(f\"\"\"\n",
|
||
"WITH nav AS (\n",
|
||
" SELECT p.id AS product_id, sp.sales_code,\n",
|
||
" CAST(sp.unit_price AS FLOAT64) AS nav_price\n",
|
||
" FROM {NAV} sp\n",
|
||
" JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n",
|
||
" WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1\n",
|
||
" AND sp.starting_date <= CURRENT_DATETIME()\n",
|
||
" AND (sp.ending_date IS NULL\n",
|
||
" OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n",
|
||
" OR sp.ending_date >= CURRENT_DATETIME())\n",
|
||
" AND sp.variant_code = ''\n",
|
||
"),\n",
|
||
"tpv AS (\n",
|
||
" SELECT product_id, channel_id, unit_price AS tpv_price\n",
|
||
" FROM {TPV}\n",
|
||
" WHERE active = TRUE\n",
|
||
" AND starting_date <= CURRENT_DATE()\n",
|
||
" AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n",
|
||
")\n",
|
||
"SELECT nav.sales_code, tpv.channel_id,\n",
|
||
" COUNT(*) AS n_match_prod,\n",
|
||
" SUM(CASE WHEN ABS(nav.nav_price - tpv.tpv_price) < 0.01 THEN 1 ELSE 0 END) AS n_same_price,\n",
|
||
" ROUND(SUM(CASE WHEN ABS(nav.nav_price - tpv.tpv_price) < 0.01 THEN 1 ELSE 0 END) / COUNT(*) * 100, 1) AS pct_same\n",
|
||
"FROM nav JOIN tpv USING(product_id)\n",
|
||
"GROUP BY nav.sales_code, tpv.channel_id\n",
|
||
"HAVING n_match_prod > 100\n",
|
||
"ORDER BY n_same_price DESC\n",
|
||
"LIMIT 15\n",
|
||
"\"\"\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "c6d538cf",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Mapeos deducidos (alta coincidencia)\n",
|
||
"\n",
|
||
"| NAV sales_code | TPV channel_id | Canal TPV | % coincidencia |\n",
|
||
"|---:|---:|---|---:|\n",
|
||
"| 471 | 1 | aurgi.com | ~96% |\n",
|
||
"| 472 | 2 | motortown.es | 100% |\n",
|
||
"| 478 | 12 | canarias | ~100% |\n",
|
||
"\n",
|
||
"Los demás `sales_code` (473, 474, 475, 476, 477) **no tienen canal TPV equivalente directo**. Son probables price-groups internos (B2B, flotas) que no se exponen en canales TPV."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "3cceab83",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 5 · Comparación detallada de los 3 mapeos fiables\n",
|
||
"\n",
|
||
"Para cada mapeo calculamos: filas vigentes en cada lado, intersección (`both`), diferencias simétricas (`only_nav`, `only_tpv`), coincidencia exacta de precio y estadísticos de la diferencia absoluta."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 7,
|
||
"id": "c83090f0",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
|
||
" warnings.warn(\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>mapping</th>\n",
|
||
" <th>nav_rows</th>\n",
|
||
" <th>tpv_rows</th>\n",
|
||
" <th>both</th>\n",
|
||
" <th>only_nav</th>\n",
|
||
" <th>only_tpv</th>\n",
|
||
" <th>same_price</th>\n",
|
||
" <th>mean_diff</th>\n",
|
||
" <th>p50</th>\n",
|
||
" <th>p95</th>\n",
|
||
" <th>max_diff</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>471 → ch1</td>\n",
|
||
" <td>121631</td>\n",
|
||
" <td>146654</td>\n",
|
||
" <td>103075</td>\n",
|
||
" <td>18559</td>\n",
|
||
" <td>43581</td>\n",
|
||
" <td>99553</td>\n",
|
||
" <td>0.3</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>1476.27</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>472 → ch2</td>\n",
|
||
" <td>62644</td>\n",
|
||
" <td>57070</td>\n",
|
||
" <td>57059</td>\n",
|
||
" <td>5585</td>\n",
|
||
" <td>11</td>\n",
|
||
" <td>57059</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>478 → ch12</td>\n",
|
||
" <td>46435</td>\n",
|
||
" <td>41722</td>\n",
|
||
" <td>41717</td>\n",
|
||
" <td>4722</td>\n",
|
||
" <td>9</td>\n",
|
||
" <td>41713</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>27.51</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" mapping nav_rows tpv_rows both only_nav only_tpv same_price \\\n",
|
||
"0 471 → ch1 121631 146654 103075 18559 43581 99553 \n",
|
||
"1 472 → ch2 62644 57070 57059 5585 11 57059 \n",
|
||
"2 478 → ch12 46435 41722 41717 4722 9 41713 \n",
|
||
"\n",
|
||
" mean_diff p50 p95 max_diff \n",
|
||
"0 0.3 0.0 0.0 1476.27 \n",
|
||
"1 0.0 0.0 0.0 0.00 \n",
|
||
"2 0.0 0.0 0.0 27.51 "
|
||
]
|
||
},
|
||
"execution_count": 7,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"def compare(sales_code: str, channel_id: int) -> pd.DataFrame:\n",
|
||
" sql = f\"\"\"\n",
|
||
" WITH nav AS (\n",
|
||
" SELECT p.id AS product_id, CAST(sp.unit_price AS FLOAT64) AS nav_price\n",
|
||
" FROM {NAV} sp JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n",
|
||
" WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1 AND sp.sales_code = '{sales_code}'\n",
|
||
" AND sp.starting_date <= CURRENT_DATETIME()\n",
|
||
" AND (sp.ending_date IS NULL\n",
|
||
" OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n",
|
||
" OR sp.ending_date >= CURRENT_DATETIME())\n",
|
||
" AND sp.variant_code = ''\n",
|
||
" ),\n",
|
||
" tpv AS (\n",
|
||
" SELECT product_id, unit_price AS tpv_price FROM {TPV}\n",
|
||
" WHERE channel_id = {channel_id} AND active = TRUE\n",
|
||
" AND starting_date <= CURRENT_DATE()\n",
|
||
" AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n",
|
||
" )\n",
|
||
" SELECT\n",
|
||
" '{sales_code} \\u2192 ch{channel_id}' AS mapping,\n",
|
||
" (SELECT COUNT(*) FROM nav) AS nav_rows,\n",
|
||
" (SELECT COUNT(*) FROM tpv) AS tpv_rows,\n",
|
||
" (SELECT COUNT(*) FROM nav JOIN tpv USING(product_id)) AS both,\n",
|
||
" (SELECT COUNT(*) FROM nav LEFT JOIN tpv USING(product_id) WHERE tpv.product_id IS NULL) AS only_nav,\n",
|
||
" (SELECT COUNT(*) FROM tpv LEFT JOIN nav USING(product_id) WHERE nav.product_id IS NULL) AS only_tpv,\n",
|
||
" (SELECT COUNT(*) FROM nav JOIN tpv USING(product_id) WHERE ABS(nav_price - tpv_price) < 0.01) AS same_price,\n",
|
||
" (SELECT ROUND(AVG(ABS(nav_price - tpv_price)), 2) FROM nav JOIN tpv USING(product_id)) AS mean_diff,\n",
|
||
" (SELECT ROUND(APPROX_QUANTILES(ABS(nav_price - tpv_price), 100)[OFFSET(50)], 2) FROM nav JOIN tpv USING(product_id)) AS p50,\n",
|
||
" (SELECT ROUND(APPROX_QUANTILES(ABS(nav_price - tpv_price), 100)[OFFSET(95)], 2) FROM nav JOIN tpv USING(product_id)) AS p95,\n",
|
||
" (SELECT ROUND(MAX(ABS(nav_price - tpv_price)), 2) FROM nav JOIN tpv USING(product_id)) AS max_diff\n",
|
||
" \"\"\"\n",
|
||
" return q(sql)\n",
|
||
"\n",
|
||
"comparison = pd.concat([\n",
|
||
" compare(\"471\", 1), # aurgi.com\n",
|
||
" compare(\"472\", 2), # motortown.es\n",
|
||
" compare(\"478\", 12), # canarias\n",
|
||
"], ignore_index=True)\n",
|
||
"comparison"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "06c2882f",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 6 · Outliers de 471 → aurgi.com\n",
|
||
"\n",
|
||
"En 471→1 el ~3.4% de productos no coincide. Inspeccionamos los mayores outliers para ver si es ruido puntual o algo sistemático."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 8,
|
||
"id": "37454278",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
|
||
" warnings.warn(\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>nav_id</th>\n",
|
||
" <th>normalized_ref</th>\n",
|
||
" <th>nav_price</th>\n",
|
||
" <th>tpv_price</th>\n",
|
||
" <th>diff_eur</th>\n",
|
||
" <th>nav_updated</th>\n",
|
||
" <th>tpv_updated</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>MKT539258</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>79.17</td>\n",
|
||
" <td>1555.44</td>\n",
|
||
" <td>1476.27</td>\n",
|
||
" <td>2023-12-20</td>\n",
|
||
" <td>2026-01-27 18:24:16.236286+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>MKT263951</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>612.07</td>\n",
|
||
" <td>945.47</td>\n",
|
||
" <td>333.40</td>\n",
|
||
" <td>2025-10-15</td>\n",
|
||
" <td>2026-04-11 03:42:31.985658+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>MKT528522558</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>801.86</td>\n",
|
||
" <td>502.90</td>\n",
|
||
" <td>-298.96</td>\n",
|
||
" <td>2026-03-08</td>\n",
|
||
" <td>2026-04-15 09:03:06.036752+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>MKT1285247190</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>39.49</td>\n",
|
||
" <td>280.10</td>\n",
|
||
" <td>240.61</td>\n",
|
||
" <td>2025-10-07</td>\n",
|
||
" <td>2026-03-24 12:16:47.114948+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>MKT248783</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>699.99</td>\n",
|
||
" <td>923.83</td>\n",
|
||
" <td>223.84</td>\n",
|
||
" <td>2023-12-20</td>\n",
|
||
" <td>2026-04-14 07:23:17.228074+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>MKT235345</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>728.97</td>\n",
|
||
" <td>517.64</td>\n",
|
||
" <td>-211.33</td>\n",
|
||
" <td>2023-12-20</td>\n",
|
||
" <td>2026-04-10 10:24:57.693899+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6</th>\n",
|
||
" <td>MKT1102764312</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>374.71</td>\n",
|
||
" <td>189.00</td>\n",
|
||
" <td>-185.71</td>\n",
|
||
" <td>2025-05-04</td>\n",
|
||
" <td>2026-04-14 14:05:20.943072+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>MKT312845</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>996.37</td>\n",
|
||
" <td>826.18</td>\n",
|
||
" <td>-170.19</td>\n",
|
||
" <td>2023-12-20</td>\n",
|
||
" <td>2026-04-09 12:43:19.657585+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>MKT1009692625</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>30.48</td>\n",
|
||
" <td>192.72</td>\n",
|
||
" <td>162.24</td>\n",
|
||
" <td>2025-10-09</td>\n",
|
||
" <td>2026-04-14 13:24:10.117913+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>9</th>\n",
|
||
" <td>MKT1102764641</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>473.90</td>\n",
|
||
" <td>629.00</td>\n",
|
||
" <td>155.10</td>\n",
|
||
" <td>2025-11-07</td>\n",
|
||
" <td>2026-04-10 09:22:42.013122+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>10</th>\n",
|
||
" <td>MKT235222</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>1155.85</td>\n",
|
||
" <td>1013.29</td>\n",
|
||
" <td>-142.56</td>\n",
|
||
" <td>2023-12-20</td>\n",
|
||
" <td>2026-04-01 17:44:05.939763+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>11</th>\n",
|
||
" <td>MKT119817</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>97.80</td>\n",
|
||
" <td>227.93</td>\n",
|
||
" <td>130.13</td>\n",
|
||
" <td>2023-12-20</td>\n",
|
||
" <td>2026-04-13 18:27:22.018325+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>12</th>\n",
|
||
" <td>MKT1102764639</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>449.00</td>\n",
|
||
" <td>579.00</td>\n",
|
||
" <td>130.00</td>\n",
|
||
" <td>2025-11-28</td>\n",
|
||
" <td>2026-04-15 09:03:04.974830+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>13</th>\n",
|
||
" <td>MKT119351</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>0.00</td>\n",
|
||
" <td>125.90</td>\n",
|
||
" <td>125.90</td>\n",
|
||
" <td>2023-12-20</td>\n",
|
||
" <td>2026-04-13 06:22:44.921118+00:00</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>14</th>\n",
|
||
" <td>MKT1102764780</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>453.22</td>\n",
|
||
" <td>327.90</td>\n",
|
||
" <td>-125.32</td>\n",
|
||
" <td>2025-07-08</td>\n",
|
||
" <td>2026-04-14 11:24:44.704392+00:00</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" nav_id normalized_ref nav_price tpv_price diff_eur nav_updated \\\n",
|
||
"0 MKT539258 NaN 79.17 1555.44 1476.27 2023-12-20 \n",
|
||
"1 MKT263951 NaN 612.07 945.47 333.40 2025-10-15 \n",
|
||
"2 MKT528522558 NaN 801.86 502.90 -298.96 2026-03-08 \n",
|
||
"3 MKT1285247190 NaN 39.49 280.10 240.61 2025-10-07 \n",
|
||
"4 MKT248783 NaN 699.99 923.83 223.84 2023-12-20 \n",
|
||
"5 MKT235345 NaN 728.97 517.64 -211.33 2023-12-20 \n",
|
||
"6 MKT1102764312 NaN 374.71 189.00 -185.71 2025-05-04 \n",
|
||
"7 MKT312845 NaN 996.37 826.18 -170.19 2023-12-20 \n",
|
||
"8 MKT1009692625 NaN 30.48 192.72 162.24 2025-10-09 \n",
|
||
"9 MKT1102764641 NaN 473.90 629.00 155.10 2025-11-07 \n",
|
||
"10 MKT235222 NaN 1155.85 1013.29 -142.56 2023-12-20 \n",
|
||
"11 MKT119817 NaN 97.80 227.93 130.13 2023-12-20 \n",
|
||
"12 MKT1102764639 NaN 449.00 579.00 130.00 2025-11-28 \n",
|
||
"13 MKT119351 NaN 0.00 125.90 125.90 2023-12-20 \n",
|
||
"14 MKT1102764780 NaN 453.22 327.90 -125.32 2025-07-08 \n",
|
||
"\n",
|
||
" tpv_updated \n",
|
||
"0 2026-01-27 18:24:16.236286+00:00 \n",
|
||
"1 2026-04-11 03:42:31.985658+00:00 \n",
|
||
"2 2026-04-15 09:03:06.036752+00:00 \n",
|
||
"3 2026-03-24 12:16:47.114948+00:00 \n",
|
||
"4 2026-04-14 07:23:17.228074+00:00 \n",
|
||
"5 2026-04-10 10:24:57.693899+00:00 \n",
|
||
"6 2026-04-14 14:05:20.943072+00:00 \n",
|
||
"7 2026-04-09 12:43:19.657585+00:00 \n",
|
||
"8 2026-04-14 13:24:10.117913+00:00 \n",
|
||
"9 2026-04-10 09:22:42.013122+00:00 \n",
|
||
"10 2026-04-01 17:44:05.939763+00:00 \n",
|
||
"11 2026-04-13 18:27:22.018325+00:00 \n",
|
||
"12 2026-04-15 09:03:04.974830+00:00 \n",
|
||
"13 2026-04-13 06:22:44.921118+00:00 \n",
|
||
"14 2026-04-14 11:24:44.704392+00:00 "
|
||
]
|
||
},
|
||
"execution_count": 8,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"q(f\"\"\"\n",
|
||
"WITH nav AS (\n",
|
||
" SELECT p.id AS product_id, p.nav_id, p.normalized_ref,\n",
|
||
" CAST(sp.unit_price AS FLOAT64) AS nav_price,\n",
|
||
" sp.fecha_ultima_modificacion AS nav_updated\n",
|
||
" FROM {NAV} sp JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n",
|
||
" WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1 AND sp.sales_code = '471'\n",
|
||
" AND sp.starting_date <= CURRENT_DATETIME()\n",
|
||
" AND (sp.ending_date IS NULL OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n",
|
||
" OR sp.ending_date >= CURRENT_DATETIME())\n",
|
||
" AND sp.variant_code = ''\n",
|
||
"),\n",
|
||
"tpv AS (\n",
|
||
" SELECT product_id, unit_price AS tpv_price, updated_at AS tpv_updated\n",
|
||
" FROM {TPV}\n",
|
||
" WHERE channel_id = 1 AND active = TRUE\n",
|
||
" AND starting_date <= CURRENT_DATE()\n",
|
||
" AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n",
|
||
")\n",
|
||
"SELECT nav.nav_id, nav.normalized_ref,\n",
|
||
" ROUND(nav.nav_price, 2) AS nav_price,\n",
|
||
" ROUND(tpv.tpv_price, 2) AS tpv_price,\n",
|
||
" ROUND(tpv.tpv_price - nav.nav_price, 2) AS diff_eur,\n",
|
||
" nav.nav_updated, tpv.tpv_updated\n",
|
||
"FROM nav JOIN tpv USING(product_id)\n",
|
||
"WHERE ABS(nav.nav_price - tpv.tpv_price) >= 0.01\n",
|
||
"ORDER BY ABS(nav.nav_price - tpv.tpv_price) DESC\n",
|
||
"LIMIT 15\n",
|
||
"\"\"\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 9,
|
||
"id": "e9bc35d5",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stderr",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n",
|
||
" warnings.warn(\n"
|
||
]
|
||
},
|
||
{
|
||
"data": {
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>lt_10cent</th>\n",
|
||
" <th>c10_to_1eur</th>\n",
|
||
" <th>c1_to_10</th>\n",
|
||
" <th>c10_to_100</th>\n",
|
||
" <th>ge_100</th>\n",
|
||
" <th>total_diff</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>119</td>\n",
|
||
" <td>335</td>\n",
|
||
" <td>2273</td>\n",
|
||
" <td>773</td>\n",
|
||
" <td>22</td>\n",
|
||
" <td>3522</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" lt_10cent c10_to_1eur c1_to_10 c10_to_100 ge_100 total_diff\n",
|
||
"0 119 335 2273 773 22 3522"
|
||
]
|
||
},
|
||
"execution_count": 9,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# Distribucion de la magnitud de los diffs en 471 -> aurgi.com\n",
|
||
"q(f\"\"\"\n",
|
||
"WITH nav AS (\n",
|
||
" SELECT p.id AS product_id, CAST(sp.unit_price AS FLOAT64) AS nav_price\n",
|
||
" FROM {NAV} sp JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n",
|
||
" WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1 AND sp.sales_code = '471'\n",
|
||
" AND sp.starting_date <= CURRENT_DATETIME()\n",
|
||
" AND (sp.ending_date IS NULL OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n",
|
||
" OR sp.ending_date >= CURRENT_DATETIME())\n",
|
||
" AND sp.variant_code = ''\n",
|
||
"),\n",
|
||
"tpv AS (\n",
|
||
" SELECT product_id, unit_price AS tpv_price FROM {TPV}\n",
|
||
" WHERE channel_id = 1 AND active = TRUE\n",
|
||
" AND starting_date <= CURRENT_DATE()\n",
|
||
" AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n",
|
||
"),\n",
|
||
"diffs AS (\n",
|
||
" SELECT ABS(nav.nav_price - tpv.tpv_price) AS d\n",
|
||
" FROM nav JOIN tpv USING(product_id)\n",
|
||
" WHERE ABS(nav.nav_price - tpv.tpv_price) >= 0.01\n",
|
||
")\n",
|
||
"SELECT\n",
|
||
" COUNTIF(d < 0.10) AS lt_10cent,\n",
|
||
" COUNTIF(d >= 0.10 AND d < 1.00) AS c10_to_1eur,\n",
|
||
" COUNTIF(d >= 1.00 AND d < 10.00) AS c1_to_10,\n",
|
||
" COUNTIF(d >= 10.00 AND d < 100.00) AS c10_to_100,\n",
|
||
" COUNTIF(d >= 100.00) AS ge_100,\n",
|
||
" COUNT(*) AS total_diff\n",
|
||
"FROM diffs\n",
|
||
"\"\"\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ee00a226",
|
||
"metadata": {},
|
||
"source": [
|
||
"## 7 · Conclusión\n",
|
||
"\n",
|
||
"### ¿Son las mismas tablas?\n",
|
||
"\n",
|
||
"**No exactamente, pero coinciden casi perfectamente donde se puede cruzar.**\n",
|
||
"\n",
|
||
"1. **Granularidad distinta.** NAV operacional (1.6M, un registro por precio vigente); TPV histórica (85M totales, ~1M activas). Comparación sólo tiene sentido filtrando a lo vigente en ambas.\n",
|
||
"\n",
|
||
"2. **Mapeo no documentado.** NAV usa `sales_type=1` + `sales_code ∈ {471..478}`; TPV usa `channel_id`. Deducidos empíricamente:\n",
|
||
" - **471 → aurgi.com (1)** — ~96% coincidencia\n",
|
||
" - **472 → motortown.es (2)** — 100% coincidencia\n",
|
||
" - **478 → canarias (12)** — ~100% coincidencia\n",
|
||
" - 473–477 → sin canal TPV (probable B2B/flotas/internos)\n",
|
||
"\n",
|
||
"3. **Cobertura:**\n",
|
||
" - 472 y 478 son réplicas directas (NAV → TPV al céntimo, ~11 extras en TPV y ~5k ausentes en TPV).\n",
|
||
" - 471 ↔ aurgi.com tiene desalineación: 43,581 productos sólo en TPV (catálogo e-commerce propio), 18,559 sólo en NAV, y ~3,500 con precio distinto.\n",
|
||
"\n",
|
||
"4. **Riesgo financiero:** canales centro (motortown/canarias) = **cero riesgo**. aurgi.com necesita proceso de reconciliación si NAV se considera verdad contable.\n",
|
||
"\n",
|
||
"### Siguientes pasos sugeridos\n",
|
||
"\n",
|
||
"- Pedir a negocio qué representan `sales_code` 473–477 y si deben tener canal TPV.\n",
|
||
"- Auditar outliers > 10 € en 471→1 (celdas anteriores).\n",
|
||
"- Decidir fuente de verdad por canal en el proyecto migración NAV↔TPV."
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.13.5"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|