{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "38a86e03-762e-49e0-8794-407c39e78fc3", "metadata": {}, "outputs": [], "source": [ "# (celda inicial generada por jupyter_write create — se ignora)" ] }, { "cell_type": "markdown", "id": "02637343", "metadata": {}, "source": [ "# Comparación precios de venta NAV vs TPV\n", "\n", "**Fuentes:**\n", "- `autingo-159109.mssql2022_dbo.sales_price` — NAV (Navision)\n", "- `autingo-159109.psql_dcpublic.sale_prices` — TPV (Datacentric/PostgreSQL)\n", "\n", "**Puente:** `autingo-159109.psql_dcpublic.products.nav_id` ↔ `sales_price.item_no_`\n", "\n", "**Objetivo:** determinar si las dos tablas representan los mismos precios." ] }, { "cell_type": "markdown", "id": "f586f3bb", "metadata": {}, "source": [ "## 0 · Setup BigQuery\n", "\n", "Con ADC de end-user no tenemos `serviceusage.services.use` en `autingo-159109`. Forzamos `quota_project_id=None` para que BQ cobre al proyecto de la query. Sin esto algunas queries fallan con `403 USER_PROJECT_DENIED`." ] }, { "cell_type": "code", "execution_count": 2, "id": "bbace39d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'autingo-159109'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import google.auth\n", "from google.cloud import bigquery\n", "import pandas as pd\n", "\n", "PROJECT = \"autingo-159109\"\n", "NAV = f\"`{PROJECT}.mssql2022_dbo.sales_price`\"\n", "TPV = f\"`{PROJECT}.psql_dcpublic.sale_prices`\"\n", "PRODUCTS = f\"`{PROJECT}.psql_dcpublic.products`\"\n", "CHANNELS = f\"`{PROJECT}.psql_dcpublic.channels`\"\n", "\n", "creds, _ = google.auth.default()\n", "if hasattr(creds, \"with_quota_project\"):\n", " creds = creds.with_quota_project(None)\n", "bq = bigquery.Client(project=PROJECT, credentials=creds,\n", " client_options={\"quota_project_id\": None})\n", "\n", "def q(sql: str) -> pd.DataFrame:\n", " \"\"\"Ejecuta una query y devuelve dataframe. Inmutable.\"\"\"\n", " return bq.query(sql).to_dataframe()\n", "\n", "bq.project" ] }, { "cell_type": "markdown", "id": "979ef0d9", "metadata": {}, "source": [ "## 1 · Volumen bruto\n", "\n", "Conteo de filas de cada tabla con sus distintos filtros." ] }, { "cell_type": "code", "execution_count": 3, "id": "c39eb1cd", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tablan
0NAV  sales_price (con deleted)1626433
1NAV  sales_price (vivos)1626331
2TPV  sale_prices (active=TRUE)998620
3TPV  sale_prices (todas)85683989
4products (total)2747387
5products (con nav_id)2746630
\n", "
" ], "text/plain": [ " tabla n\n", "0 NAV sales_price (con deleted) 1626433\n", "1 NAV sales_price (vivos) 1626331\n", "2 TPV sale_prices (active=TRUE) 998620\n", "3 TPV sale_prices (todas) 85683989\n", "4 products (total) 2747387\n", "5 products (con nav_id) 2746630" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q(f\"\"\"\n", "SELECT 'NAV sales_price (vivos)' AS tabla, COUNT(*) AS n FROM {NAV} WHERE _fivetran_deleted IS NOT TRUE\n", "UNION ALL SELECT 'NAV sales_price (con deleted)', COUNT(*) FROM {NAV}\n", "UNION ALL SELECT 'TPV sale_prices (todas)', COUNT(*) FROM {TPV}\n", "UNION ALL SELECT 'TPV sale_prices (active=TRUE)', COUNT(*) FROM {TPV} WHERE active\n", "UNION ALL SELECT 'products (con nav_id)', COUNT(*) FROM {PRODUCTS} WHERE nav_id IS NOT NULL AND nav_id != ''\n", "UNION ALL SELECT 'products (total)', COUNT(*) FROM {PRODUCTS}\n", "\"\"\")" ] }, { "cell_type": "markdown", "id": "0973c7c0", "metadata": {}, "source": [ "**Lectura de los conteos:**\n", "- TPV guarda histórico completo de precios; NAV reemplaza en sitio.\n", "- Por eso TPV tiene ~86× más filas que NAV, pero sólo ~1M están activas y representan el estado vigente (comparable con los 1.6M de NAV).\n", "- Casi todos los `products` tienen `nav_id` (≥99.9%) — el puente funciona.\n", "- No son comparables fila-a-fila: hay que filtrar a lo vigente." ] }, { "cell_type": "markdown", "id": "0d5e6eae", "metadata": {}, "source": [ "## 2 · Granularidad NAV\n", "\n", "NAV usa `sales_type` + `sales_code` para distinguir listas de precio. Veamos cuántos registros hay por cada combinación:" ] }, { "cell_type": "code", "execution_count": 4, "id": "da84c08d", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sales_typesales_codenn_items
0047199
11471666483142054
2147228383362748
3147325797353527
4147416976433526
5147516690333505
614788077246464
71477500500
814769143
9233
\n", "
" ], "text/plain": [ " sales_type sales_code n n_items\n", "0 0 471 9 9\n", "1 1 471 666483 142054\n", "2 1 472 283833 62748\n", "3 1 473 257973 53527\n", "4 1 474 169764 33526\n", "5 1 475 166903 33505\n", "6 1 478 80772 46464\n", "7 1 477 500 500\n", "8 1 476 91 43\n", "9 2 3 3" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q(f\"\"\"\n", "SELECT sales_type, sales_code, COUNT(*) AS n, COUNT(DISTINCT item_no_) AS n_items\n", "FROM {NAV}\n", "WHERE _fivetran_deleted IS NOT TRUE\n", "GROUP BY sales_type, sales_code\n", "ORDER BY sales_type, n DESC\n", "\"\"\")" ] }, { "cell_type": "markdown", "id": "abc5d386", "metadata": {}, "source": [ "**Lectura:**\n", "- `sales_type=1` (Customer Price Group) = 99.99% de NAV, con 8 códigos (471–478).\n", "- `sales_type=0` y `sales_type=2` son residuales (12 filas totales). Se ignoran." ] }, { "cell_type": "markdown", "id": "38e9a417", "metadata": {}, "source": [ "## 3 · Granularidad TPV\n", "\n", "TPV separa por `channel_id`. Contamos precios activos por canal:" ] }, { "cell_type": "code", "execution_count": 5, "id": "1fe46b8d", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
channel_idnamen_prices_activen_products
01aurgi.com296357296212
13autingo.es296141296141
213Miravia164814164814
32motortown.es5961259612
412canarias4513245130
511Amazon4473944739
66Eci Marketplace3998739987
75Centros MT3054430544
810Talleres Digitales2129421294
915Tiktok Shop00
107NaN00
114Centros Aurgi00
1214Aliexpress00
138Eci Marketplace00
\n", "
" ], "text/plain": [ " channel_id name n_prices_active n_products\n", "0 1 aurgi.com 296357 296212\n", "1 3 autingo.es 296141 296141\n", "2 13 Miravia 164814 164814\n", "3 2 motortown.es 59612 59612\n", "4 12 canarias 45132 45130\n", "5 11 Amazon 44739 44739\n", "6 6 Eci Marketplace 39987 39987\n", "7 5 Centros MT 30544 30544\n", "8 10 Talleres Digitales 21294 21294\n", "9 15 Tiktok Shop 0 0\n", "10 7 NaN 0 0\n", "11 4 Centros Aurgi 0 0\n", "12 14 Aliexpress 0 0\n", "13 8 Eci Marketplace 0 0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q(f\"\"\"\n", "SELECT c.id AS channel_id, c.name,\n", " COUNT(sp.id) AS n_prices_active,\n", " COUNT(DISTINCT sp.product_id) AS n_products\n", "FROM {CHANNELS} c\n", "LEFT JOIN {TPV} sp ON sp.channel_id = c.id AND sp.active = TRUE\n", "GROUP BY c.id, c.name\n", "ORDER BY n_prices_active DESC\n", "\"\"\")" ] }, { "cell_type": "markdown", "id": "7bf15cf0", "metadata": {}, "source": [ "## 4 · Deducir el mapeo `NAV.sales_code` ↔ `TPV.channel_id`\n", "\n", "No existe tabla de equivalencia. Lo deducimos por **coincidencia empírica de precio**: para cada par `(sales_code, channel_id)` contamos cuántos productos tienen exactamente el mismo `unit_price` vigente." ] }, { "cell_type": "code", "execution_count": 6, "id": "2654f253", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sales_codechannel_idn_match_prodn_same_pricepct_same
047111030759955396.6
147225705957059100.0
2478124171741713100.0
347212416514008596.2
44782416514008596.2
54742300412417280.5
64752300422416580.4
74732489401752935.8
847512209361555474.3
947412209341555374.3
1047110150131291286.0
114731234595924026.7
1247124740543859.3
1347214742743769.2
144711234621382811.1
\n", "
" ], "text/plain": [ " sales_code channel_id n_match_prod n_same_price pct_same\n", "0 471 1 103075 99553 96.6\n", "1 472 2 57059 57059 100.0\n", "2 478 12 41717 41713 100.0\n", "3 472 12 41651 40085 96.2\n", "4 478 2 41651 40085 96.2\n", "5 474 2 30041 24172 80.5\n", "6 475 2 30042 24165 80.4\n", "7 473 2 48940 17529 35.8\n", "8 475 12 20936 15554 74.3\n", "9 474 12 20934 15553 74.3\n", "10 471 10 15013 12912 86.0\n", "11 473 12 34595 9240 26.7\n", "12 471 2 47405 4385 9.3\n", "13 472 1 47427 4376 9.2\n", "14 471 12 34621 3828 11.1" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q(f\"\"\"\n", "WITH nav AS (\n", " SELECT p.id AS product_id, sp.sales_code,\n", " CAST(sp.unit_price AS FLOAT64) AS nav_price\n", " FROM {NAV} sp\n", " JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n", " WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1\n", " AND sp.starting_date <= CURRENT_DATETIME()\n", " AND (sp.ending_date IS NULL\n", " OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n", " OR sp.ending_date >= CURRENT_DATETIME())\n", " AND sp.variant_code = ''\n", "),\n", "tpv AS (\n", " SELECT product_id, channel_id, unit_price AS tpv_price\n", " FROM {TPV}\n", " WHERE active = TRUE\n", " AND starting_date <= CURRENT_DATE()\n", " AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n", ")\n", "SELECT nav.sales_code, tpv.channel_id,\n", " COUNT(*) AS n_match_prod,\n", " SUM(CASE WHEN ABS(nav.nav_price - tpv.tpv_price) < 0.01 THEN 1 ELSE 0 END) AS n_same_price,\n", " ROUND(SUM(CASE WHEN ABS(nav.nav_price - tpv.tpv_price) < 0.01 THEN 1 ELSE 0 END) / COUNT(*) * 100, 1) AS pct_same\n", "FROM nav JOIN tpv USING(product_id)\n", "GROUP BY nav.sales_code, tpv.channel_id\n", "HAVING n_match_prod > 100\n", "ORDER BY n_same_price DESC\n", "LIMIT 15\n", "\"\"\")" ] }, { "cell_type": "markdown", "id": "c6d538cf", "metadata": {}, "source": [ "### Mapeos deducidos (alta coincidencia)\n", "\n", "| NAV sales_code | TPV channel_id | Canal TPV | % coincidencia |\n", "|---:|---:|---|---:|\n", "| 471 | 1 | aurgi.com | ~96% |\n", "| 472 | 2 | motortown.es | 100% |\n", "| 478 | 12 | canarias | ~100% |\n", "\n", "Los demás `sales_code` (473, 474, 475, 476, 477) **no tienen canal TPV equivalente directo**. Son probables price-groups internos (B2B, flotas) que no se exponen en canales TPV." ] }, { "cell_type": "markdown", "id": "3cceab83", "metadata": {}, "source": [ "## 5 · Comparación detallada de los 3 mapeos fiables\n", "\n", "Para cada mapeo calculamos: filas vigentes en cada lado, intersección (`both`), diferencias simétricas (`only_nav`, `only_tpv`), coincidencia exacta de precio y estadísticos de la diferencia absoluta." ] }, { "cell_type": "code", "execution_count": 7, "id": "c83090f0", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
mappingnav_rowstpv_rowsbothonly_navonly_tpvsame_pricemean_diffp50p95max_diff
0471 → ch11216311466541030751855943581995530.30.00.01476.27
1472 → ch2626445707057059558511570590.00.00.00.00
2478 → ch1246435417224171747229417130.00.00.027.51
\n", "
" ], "text/plain": [ " mapping nav_rows tpv_rows both only_nav only_tpv same_price \\\n", "0 471 → ch1 121631 146654 103075 18559 43581 99553 \n", "1 472 → ch2 62644 57070 57059 5585 11 57059 \n", "2 478 → ch12 46435 41722 41717 4722 9 41713 \n", "\n", " mean_diff p50 p95 max_diff \n", "0 0.3 0.0 0.0 1476.27 \n", "1 0.0 0.0 0.0 0.00 \n", "2 0.0 0.0 0.0 27.51 " ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def compare(sales_code: str, channel_id: int) -> pd.DataFrame:\n", " sql = f\"\"\"\n", " WITH nav AS (\n", " SELECT p.id AS product_id, CAST(sp.unit_price AS FLOAT64) AS nav_price\n", " FROM {NAV} sp JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n", " WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1 AND sp.sales_code = '{sales_code}'\n", " AND sp.starting_date <= CURRENT_DATETIME()\n", " AND (sp.ending_date IS NULL\n", " OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n", " OR sp.ending_date >= CURRENT_DATETIME())\n", " AND sp.variant_code = ''\n", " ),\n", " tpv AS (\n", " SELECT product_id, unit_price AS tpv_price FROM {TPV}\n", " WHERE channel_id = {channel_id} AND active = TRUE\n", " AND starting_date <= CURRENT_DATE()\n", " AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n", " )\n", " SELECT\n", " '{sales_code} \\u2192 ch{channel_id}' AS mapping,\n", " (SELECT COUNT(*) FROM nav) AS nav_rows,\n", " (SELECT COUNT(*) FROM tpv) AS tpv_rows,\n", " (SELECT COUNT(*) FROM nav JOIN tpv USING(product_id)) AS both,\n", " (SELECT COUNT(*) FROM nav LEFT JOIN tpv USING(product_id) WHERE tpv.product_id IS NULL) AS only_nav,\n", " (SELECT COUNT(*) FROM tpv LEFT JOIN nav USING(product_id) WHERE nav.product_id IS NULL) AS only_tpv,\n", " (SELECT COUNT(*) FROM nav JOIN tpv USING(product_id) WHERE ABS(nav_price - tpv_price) < 0.01) AS same_price,\n", " (SELECT ROUND(AVG(ABS(nav_price - tpv_price)), 2) FROM nav JOIN tpv USING(product_id)) AS mean_diff,\n", " (SELECT ROUND(APPROX_QUANTILES(ABS(nav_price - tpv_price), 100)[OFFSET(50)], 2) FROM nav JOIN tpv USING(product_id)) AS p50,\n", " (SELECT ROUND(APPROX_QUANTILES(ABS(nav_price - tpv_price), 100)[OFFSET(95)], 2) FROM nav JOIN tpv USING(product_id)) AS p95,\n", " (SELECT ROUND(MAX(ABS(nav_price - tpv_price)), 2) FROM nav JOIN tpv USING(product_id)) AS max_diff\n", " \"\"\"\n", " return q(sql)\n", "\n", "comparison = pd.concat([\n", " compare(\"471\", 1), # aurgi.com\n", " compare(\"472\", 2), # motortown.es\n", " compare(\"478\", 12), # canarias\n", "], ignore_index=True)\n", "comparison" ] }, { "cell_type": "markdown", "id": "06c2882f", "metadata": {}, "source": [ "## 6 · Outliers de 471 → aurgi.com\n", "\n", "En 471→1 el ~3.4% de productos no coincide. Inspeccionamos los mayores outliers para ver si es ruido puntual o algo sistemático." ] }, { "cell_type": "code", "execution_count": 8, "id": "37454278", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nav_idnormalized_refnav_pricetpv_pricediff_eurnav_updatedtpv_updated
0MKT539258NaN79.171555.441476.272023-12-202026-01-27 18:24:16.236286+00:00
1MKT263951NaN612.07945.47333.402025-10-152026-04-11 03:42:31.985658+00:00
2MKT528522558NaN801.86502.90-298.962026-03-082026-04-15 09:03:06.036752+00:00
3MKT1285247190NaN39.49280.10240.612025-10-072026-03-24 12:16:47.114948+00:00
4MKT248783NaN699.99923.83223.842023-12-202026-04-14 07:23:17.228074+00:00
5MKT235345NaN728.97517.64-211.332023-12-202026-04-10 10:24:57.693899+00:00
6MKT1102764312NaN374.71189.00-185.712025-05-042026-04-14 14:05:20.943072+00:00
7MKT312845NaN996.37826.18-170.192023-12-202026-04-09 12:43:19.657585+00:00
8MKT1009692625NaN30.48192.72162.242025-10-092026-04-14 13:24:10.117913+00:00
9MKT1102764641NaN473.90629.00155.102025-11-072026-04-10 09:22:42.013122+00:00
10MKT235222NaN1155.851013.29-142.562023-12-202026-04-01 17:44:05.939763+00:00
11MKT119817NaN97.80227.93130.132023-12-202026-04-13 18:27:22.018325+00:00
12MKT1102764639NaN449.00579.00130.002025-11-282026-04-15 09:03:04.974830+00:00
13MKT119351NaN0.00125.90125.902023-12-202026-04-13 06:22:44.921118+00:00
14MKT1102764780NaN453.22327.90-125.322025-07-082026-04-14 11:24:44.704392+00:00
\n", "
" ], "text/plain": [ " nav_id normalized_ref nav_price tpv_price diff_eur nav_updated \\\n", "0 MKT539258 NaN 79.17 1555.44 1476.27 2023-12-20 \n", "1 MKT263951 NaN 612.07 945.47 333.40 2025-10-15 \n", "2 MKT528522558 NaN 801.86 502.90 -298.96 2026-03-08 \n", "3 MKT1285247190 NaN 39.49 280.10 240.61 2025-10-07 \n", "4 MKT248783 NaN 699.99 923.83 223.84 2023-12-20 \n", "5 MKT235345 NaN 728.97 517.64 -211.33 2023-12-20 \n", "6 MKT1102764312 NaN 374.71 189.00 -185.71 2025-05-04 \n", "7 MKT312845 NaN 996.37 826.18 -170.19 2023-12-20 \n", "8 MKT1009692625 NaN 30.48 192.72 162.24 2025-10-09 \n", "9 MKT1102764641 NaN 473.90 629.00 155.10 2025-11-07 \n", "10 MKT235222 NaN 1155.85 1013.29 -142.56 2023-12-20 \n", "11 MKT119817 NaN 97.80 227.93 130.13 2023-12-20 \n", "12 MKT1102764639 NaN 449.00 579.00 130.00 2025-11-28 \n", "13 MKT119351 NaN 0.00 125.90 125.90 2023-12-20 \n", "14 MKT1102764780 NaN 453.22 327.90 -125.32 2025-07-08 \n", "\n", " tpv_updated \n", "0 2026-01-27 18:24:16.236286+00:00 \n", "1 2026-04-11 03:42:31.985658+00:00 \n", "2 2026-04-15 09:03:06.036752+00:00 \n", "3 2026-03-24 12:16:47.114948+00:00 \n", "4 2026-04-14 07:23:17.228074+00:00 \n", "5 2026-04-10 10:24:57.693899+00:00 \n", "6 2026-04-14 14:05:20.943072+00:00 \n", "7 2026-04-09 12:43:19.657585+00:00 \n", "8 2026-04-14 13:24:10.117913+00:00 \n", "9 2026-04-10 09:22:42.013122+00:00 \n", "10 2026-04-01 17:44:05.939763+00:00 \n", "11 2026-04-13 18:27:22.018325+00:00 \n", "12 2026-04-15 09:03:04.974830+00:00 \n", "13 2026-04-13 06:22:44.921118+00:00 \n", "14 2026-04-14 11:24:44.704392+00:00 " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "q(f\"\"\"\n", "WITH nav AS (\n", " SELECT p.id AS product_id, p.nav_id, p.normalized_ref,\n", " CAST(sp.unit_price AS FLOAT64) AS nav_price,\n", " sp.fecha_ultima_modificacion AS nav_updated\n", " FROM {NAV} sp JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n", " WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1 AND sp.sales_code = '471'\n", " AND sp.starting_date <= CURRENT_DATETIME()\n", " AND (sp.ending_date IS NULL OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n", " OR sp.ending_date >= CURRENT_DATETIME())\n", " AND sp.variant_code = ''\n", "),\n", "tpv AS (\n", " SELECT product_id, unit_price AS tpv_price, updated_at AS tpv_updated\n", " FROM {TPV}\n", " WHERE channel_id = 1 AND active = TRUE\n", " AND starting_date <= CURRENT_DATE()\n", " AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n", ")\n", "SELECT nav.nav_id, nav.normalized_ref,\n", " ROUND(nav.nav_price, 2) AS nav_price,\n", " ROUND(tpv.tpv_price, 2) AS tpv_price,\n", " ROUND(tpv.tpv_price - nav.nav_price, 2) AS diff_eur,\n", " nav.nav_updated, tpv.tpv_updated\n", "FROM nav JOIN tpv USING(product_id)\n", "WHERE ABS(nav.nav_price - tpv.tpv_price) >= 0.01\n", "ORDER BY ABS(nav.nav_price - tpv.tpv_price) DESC\n", "LIMIT 15\n", "\"\"\")" ] }, { "cell_type": "code", "execution_count": 9, "id": "e9bc35d5", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/egutierrez/fn_registry/projects/aurgi/analysis/sale_prices_comprobation/.venv/lib/python3.13/site-packages/google/cloud/bigquery/table.py:2086: UserWarning: BigQuery Storage module not found, fetch data with the REST endpoint instead.\n", " warnings.warn(\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
lt_10centc10_to_1eurc1_to_10c10_to_100ge_100total_diff
01193352273773223522
\n", "
" ], "text/plain": [ " lt_10cent c10_to_1eur c1_to_10 c10_to_100 ge_100 total_diff\n", "0 119 335 2273 773 22 3522" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Distribucion de la magnitud de los diffs en 471 -> aurgi.com\n", "q(f\"\"\"\n", "WITH nav AS (\n", " SELECT p.id AS product_id, CAST(sp.unit_price AS FLOAT64) AS nav_price\n", " FROM {NAV} sp JOIN {PRODUCTS} p ON p.nav_id = sp.item_no_\n", " WHERE sp._fivetran_deleted IS NOT TRUE AND sp.sales_type = 1 AND sp.sales_code = '471'\n", " AND sp.starting_date <= CURRENT_DATETIME()\n", " AND (sp.ending_date IS NULL OR sp.ending_date = DATETIME '0001-01-01 00:00:00'\n", " OR sp.ending_date >= CURRENT_DATETIME())\n", " AND sp.variant_code = ''\n", "),\n", "tpv AS (\n", " SELECT product_id, unit_price AS tpv_price FROM {TPV}\n", " WHERE channel_id = 1 AND active = TRUE\n", " AND starting_date <= CURRENT_DATE()\n", " AND (ending_date IS NULL OR ending_date >= CURRENT_DATE())\n", "),\n", "diffs AS (\n", " SELECT ABS(nav.nav_price - tpv.tpv_price) AS d\n", " FROM nav JOIN tpv USING(product_id)\n", " WHERE ABS(nav.nav_price - tpv.tpv_price) >= 0.01\n", ")\n", "SELECT\n", " COUNTIF(d < 0.10) AS lt_10cent,\n", " COUNTIF(d >= 0.10 AND d < 1.00) AS c10_to_1eur,\n", " COUNTIF(d >= 1.00 AND d < 10.00) AS c1_to_10,\n", " COUNTIF(d >= 10.00 AND d < 100.00) AS c10_to_100,\n", " COUNTIF(d >= 100.00) AS ge_100,\n", " COUNT(*) AS total_diff\n", "FROM diffs\n", "\"\"\")" ] }, { "cell_type": "markdown", "id": "ee00a226", "metadata": {}, "source": [ "## 7 · Conclusión\n", "\n", "### ¿Son las mismas tablas?\n", "\n", "**No exactamente, pero coinciden casi perfectamente donde se puede cruzar.**\n", "\n", "1. **Granularidad distinta.** NAV operacional (1.6M, un registro por precio vigente); TPV histórica (85M totales, ~1M activas). Comparación sólo tiene sentido filtrando a lo vigente en ambas.\n", "\n", "2. **Mapeo no documentado.** NAV usa `sales_type=1` + `sales_code ∈ {471..478}`; TPV usa `channel_id`. Deducidos empíricamente:\n", " - **471 → aurgi.com (1)** — ~96% coincidencia\n", " - **472 → motortown.es (2)** — 100% coincidencia\n", " - **478 → canarias (12)** — ~100% coincidencia\n", " - 473–477 → sin canal TPV (probable B2B/flotas/internos)\n", "\n", "3. **Cobertura:**\n", " - 472 y 478 son réplicas directas (NAV → TPV al céntimo, ~11 extras en TPV y ~5k ausentes en TPV).\n", " - 471 ↔ aurgi.com tiene desalineación: 43,581 productos sólo en TPV (catálogo e-commerce propio), 18,559 sólo en NAV, y ~3,500 con precio distinto.\n", "\n", "4. **Riesgo financiero:** canales centro (motortown/canarias) = **cero riesgo**. aurgi.com necesita proceso de reconciliación si NAV se considera verdad contable.\n", "\n", "### Siguientes pasos sugeridos\n", "\n", "- Pedir a negocio qué representan `sales_code` 473–477 y si deben tener canal TPV.\n", "- Auditar outliers > 10 € en 471→1 (celdas anteriores).\n", "- Decidir fuente de verdad por canal en el proyecto migración NAV↔TPV." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 5 }