{ "cells": [ { "cell_type": "markdown", "id": "a0b1c2d3", "metadata": {}, "source": [ "# 03 - EDA Combinado: Happy Robot + Call Transactions\n", "\n", "Analisis cruzado entre las llamadas procesadas por el agente IA (Happy Robot) y las transacciones del call center.\n", "\n", "**Tablas:**\n", "- `happy_robot_publicpublic.run_events` (72 filas) - eventos del agente IA para operaciones de lunas\n", "- `psql_dcpublic.call_transactions` (24.7M filas) - transacciones del call center\n", "\n", "**Join bridges:**\n", "1. Telefono: `source` <-> `telefonos` / `orig` (normalizando +34, espacios, asteriscos)\n", "2. Matricula: `license_plate` <-> `cliente`\n", "3. Ventana temporal: `time` <-> `date_time` (proximidad en minutos)\n", "4. Filtro de campana: campanas Mutua en call_transactions" ] }, { "cell_type": "markdown", "id": "b1c2d3e4", "metadata": {}, "source": [ "## 1. Setup" ] }, { "cell_type": "code", "execution_count": null, "id": "c2d3e4f5", "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas as pd\n", "import plotly.express as px\n", "import plotly.graph_objects as go\n", "from dotenv import load_dotenv\n", "load_dotenv()\n", "\n", "import sys\n", "sys.path.insert(0, os.path.join(os.environ[\"FN_REGISTRY_ROOT\"], \"python\", \"functions\"))\n", "from metabase.client import MetabaseClient\n", "\n", "client = MetabaseClient(os.environ[\"METABASE_URL\"], os.environ[\"METABASE_API_KEY\"])\n", "\n", "def query_to_df(sql: str) -> pd.DataFrame:\n", " \"\"\"Execute SQL against BigQuery via Metabase API, return DataFrame.\"\"\"\n", " result = client.request('POST', '/api/dataset', json={\n", " 'database': 6, 'type': 'native', 'native': {'query': sql}\n", " })\n", " cols = [c['name'] for c in result['data']['cols']]\n", " return pd.DataFrame(result['data']['rows'], columns=cols)" ] }, { "cell_type": "markdown", "id": "d3e4f5a6", "metadata": {}, "source": [ "## 2. Carga de run_events" ] }, { "cell_type": "code", "execution_count": null, "id": "e4f5a6b7", "metadata": {}, "outputs": [], "source": [ "df_re = query_to_df(\"\"\"\n", "SELECT *\n", "FROM `happy_robot_publicpublic.run_events`\n", "ORDER BY time\n", "\"\"\")\n", "\n", "print(f\"run_events: {len(df_re)} filas, {len(df_re.columns)} columnas\")\n", "df_re.dtypes" ] }, { "cell_type": "code", "execution_count": null, "id": "f5a6b7c8", "metadata": {}, "outputs": [], "source": [ "# Parsear datetimes y columnas numericas\n", "df_re['time'] = pd.to_datetime(df_re['time'])\n", "if 'duration' in df_re.columns:\n", " df_re['duration'] = pd.to_numeric(df_re['duration'], errors='coerce')\n", "\n", "print(f\"Rango temporal: {df_re['time'].min()} -> {df_re['time'].max()}\")\n", "print(f\"Clasificaciones: {df_re['classification'].value_counts().to_dict()}\")\n", "df_re.head(10)" ] }, { "cell_type": "code", "execution_count": null, "id": "a6b7c8d9", "metadata": {}, "outputs": [], "source": [ "# Resumen estadistico de run_events\n", "print(\"Columnas disponibles:\", list(df_re.columns))\n", "print(f\"\\nSources unicos (telefonos): {df_re['source'].nunique()}\")\n", "print(f\"Matriculas unicas: {df_re['license_plate'].nunique()}\")\n", "print(f\"Nulos en source: {df_re['source'].isna().sum()}\")\n", "print(f\"Nulos en license_plate: {df_re['license_plate'].isna().sum()}\")\n", "df_re.describe(include='all')" ] }, { "cell_type": "markdown", "id": "b7c8d9e0", "metadata": {}, "source": [ "## 3. Identificar campanas Mutua en call_transactions" ] }, { "cell_type": "code", "execution_count": null, "id": "c8d9e0f1", "metadata": {}, "outputs": [], "source": [ "df_mutua_campaigns = query_to_df(\"\"\"\n", "SELECT\n", " campaign_name,\n", " COUNT(*) as total_calls,\n", " MIN(date_time) as first_call,\n", " MAX(date_time) as last_call\n", "FROM `psql_dcpublic.call_transactions`\n", "WHERE LOWER(campaign_name) LIKE '%mutua%'\n", "GROUP BY campaign_name\n", "ORDER BY total_calls DESC\n", "\"\"\")\n", "\n", "print(\"Campanas Mutua encontradas:\")\n", "df_mutua_campaigns" ] }, { "cell_type": "code", "execution_count": null, "id": "d9e0f1a2", "metadata": {}, "outputs": [], "source": [ "# Volumen reciente de campanas Mutua (desde abril 2026, periodo Happy Robot)\n", "df_mutua_recent = query_to_df(\"\"\"\n", "SELECT\n", " campaign_name,\n", " DATE(date_time) as dia,\n", " COUNT(*) as calls\n", "FROM `psql_dcpublic.call_transactions`\n", "WHERE LOWER(campaign_name) LIKE '%mutua%'\n", " AND date_time >= '2026-04-01'\n", "GROUP BY campaign_name, DATE(date_time)\n", "ORDER BY dia\n", "\"\"\")\n", "\n", "df_mutua_recent['calls'] = pd.to_numeric(df_mutua_recent['calls'])\n", "\n", "fig = px.bar(df_mutua_recent, x='dia', y='calls', color='campaign_name',\n", " title='Llamadas Mutua por dia (desde abril 2026)',\n", " labels={'calls': 'Llamadas', 'dia': 'Fecha', 'campaign_name': 'Campana'})\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "e0f1a2b3", "metadata": {}, "source": [ "## 4. Join por telefono\n", "\n", "Normalizacion:\n", "- `run_events.source`: formato `+34623122972` -> extraer `623122972`\n", "- `call_transactions.telefonos`: formato ` 623122972` (espacio delante) -> strip\n", "- `call_transactions.orig`: formato ` *34623122972` -> strip `*34`\n", "\n", "Join: telefono normalizado + mismo dia" ] }, { "cell_type": "code", "execution_count": null, "id": "f1a2b3c4", "metadata": {}, "outputs": [], "source": [ "df_phone_join = query_to_df(\"\"\"\n", "WITH re AS (\n", " SELECT *,\n", " REPLACE(REPLACE(source, '+34', ''), ' ', '') AS phone_clean\n", " FROM `happy_robot_publicpublic.run_events`\n", "),\n", "ct AS (\n", " SELECT *,\n", " REPLACE(REPLACE(REPLACE(TRIM(telefonos), '*34', ''), '+34', ''), ' ', '') AS phone_clean\n", " FROM `psql_dcpublic.call_transactions`\n", " WHERE LOWER(campaign_name) LIKE '%mutua%'\n", " AND date_time >= '2026-04-01'\n", ")\n", "SELECT\n", " re.id AS re_id,\n", " re.run_id,\n", " re.time AS re_time,\n", " re.classification,\n", " ct.id AS ct_id,\n", " ct.date_time AS ct_time,\n", " ct.campaign_name,\n", " ct.description AS ct_resolution,\n", " ct.agente,\n", " ct.t_convers\n", "FROM re\n", "JOIN ct ON re.phone_clean = ct.phone_clean\n", " AND DATE(re.time) = DATE(ct.date_time)\n", "ORDER BY re.time\n", "\"\"\")\n", "\n", "print(f\"Matches por telefono + mismo dia: {len(df_phone_join)}\")\n", "print(f\"run_events distintos con match: {df_phone_join['re_id'].nunique()}\")\n", "df_phone_join.head(20)" ] }, { "cell_type": "code", "execution_count": null, "id": "a2b3c4d5", "metadata": {}, "outputs": [], "source": [ "# Tambien probar join via campo orig (formato *34...)\n", "df_phone_join_orig = query_to_df(\"\"\"\n", "WITH re AS (\n", " SELECT *,\n", " REPLACE(REPLACE(source, '+34', ''), ' ', '') AS phone_clean\n", " FROM `happy_robot_publicpublic.run_events`\n", "),\n", "ct AS (\n", " SELECT *,\n", " REPLACE(REPLACE(REPLACE(TRIM(orig), '*34', ''), '+34', ''), ' ', '') AS phone_clean\n", " FROM `psql_dcpublic.call_transactions`\n", " WHERE LOWER(campaign_name) LIKE '%mutua%'\n", " AND date_time >= '2026-04-01'\n", ")\n", "SELECT\n", " re.id AS re_id,\n", " re.run_id,\n", " re.time AS re_time,\n", " re.classification,\n", " ct.id AS ct_id,\n", " ct.date_time AS ct_time,\n", " ct.campaign_name,\n", " ct.description AS ct_resolution,\n", " ct.agente,\n", " ct.t_convers\n", "FROM re\n", "JOIN ct ON re.phone_clean = ct.phone_clean\n", " AND DATE(re.time) = DATE(ct.date_time)\n", "ORDER BY re.time\n", "\"\"\")\n", "\n", "print(f\"Matches por orig + mismo dia: {len(df_phone_join_orig)}\")\n", "print(f\"run_events distintos con match (orig): {df_phone_join_orig['re_id'].nunique()}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "b3c4d5e6", "metadata": {}, "outputs": [], "source": [ "# Combinar ambos joins de telefono (telefonos + orig), dedup por par (re_id, ct_id)\n", "df_phone_all = pd.concat([df_phone_join, df_phone_join_orig]).drop_duplicates(subset=['re_id', 'ct_id'])\n", "print(f\"Matches telefono combinados (dedup): {len(df_phone_all)}\")\n", "print(f\"run_events distintos con match telefonico: {df_phone_all['re_id'].nunique()} de {len(df_re)} totales\")" ] }, { "cell_type": "markdown", "id": "c4d5e6f7", "metadata": {}, "source": [ "## 5. Join por matricula\n", "\n", "- `run_events.license_plate` <-> `call_transactions.cliente`\n", "- Ambos son matriculas (ej: `0000MHG`, `9928MKH`)\n", "- Join directo + mismo dia + campanas Mutua" ] }, { "cell_type": "code", "execution_count": null, "id": "d5e6f7a8", "metadata": {}, "outputs": [], "source": [ "df_plate_join = query_to_df(\"\"\"\n", "WITH re AS (\n", " SELECT *\n", " FROM `happy_robot_publicpublic.run_events`\n", " WHERE license_plate IS NOT NULL AND TRIM(license_plate) != ''\n", "),\n", "ct AS (\n", " SELECT *\n", " FROM `psql_dcpublic.call_transactions`\n", " WHERE LOWER(campaign_name) LIKE '%mutua%'\n", " AND date_time >= '2026-04-01'\n", " AND cliente IS NOT NULL AND TRIM(cliente) != ''\n", ")\n", "SELECT\n", " re.id AS re_id,\n", " re.run_id,\n", " re.time AS re_time,\n", " re.license_plate,\n", " re.classification,\n", " ct.id AS ct_id,\n", " ct.date_time AS ct_time,\n", " ct.campaign_name,\n", " ct.cliente,\n", " ct.description AS ct_resolution,\n", " ct.agente,\n", " ct.t_convers\n", "FROM re\n", "JOIN ct ON UPPER(TRIM(re.license_plate)) = UPPER(TRIM(ct.cliente))\n", " AND DATE(re.time) = DATE(ct.date_time)\n", "ORDER BY re.time\n", "\"\"\")\n", "\n", "print(f\"Matches por matricula + mismo dia: {len(df_plate_join)}\")\n", "print(f\"run_events distintos con match por matricula: {df_plate_join['re_id'].nunique()}\")\n", "df_plate_join.head(20)" ] }, { "cell_type": "code", "execution_count": null, "id": "e6f7a8b9", "metadata": {}, "outputs": [], "source": [ "# Matriculas sin match - que hay en license_plate que no aparece en cliente?\n", "df_no_match_plates = query_to_df(\"\"\"\n", "WITH re AS (\n", " SELECT DISTINCT UPPER(TRIM(license_plate)) AS plate\n", " FROM `happy_robot_publicpublic.run_events`\n", " WHERE license_plate IS NOT NULL AND TRIM(license_plate) != ''\n", "),\n", "ct AS (\n", " SELECT DISTINCT UPPER(TRIM(cliente)) AS plate\n", " FROM `psql_dcpublic.call_transactions`\n", " WHERE LOWER(campaign_name) LIKE '%mutua%'\n", " AND date_time >= '2026-04-01'\n", " AND cliente IS NOT NULL AND TRIM(cliente) != ''\n", ")\n", "SELECT re.plate, \n", " CASE WHEN ct.plate IS NOT NULL THEN 'SI' ELSE 'NO' END AS existe_en_ct\n", "FROM re\n", "LEFT JOIN ct ON re.plate = ct.plate\n", "ORDER BY existe_en_ct, re.plate\n", "\"\"\")\n", "\n", "print(\"Matriculas de run_events y si existen en call_transactions (Mutua, abril+):\")\n", "df_no_match_plates" ] }, { "cell_type": "markdown", "id": "f7a8b9c0", "metadata": {}, "source": [ "## 6. Analisis de cobertura\n", "\n", "- Que % de run_events tienen match en call_transactions?\n", "- Que % de call_transactions Mutua recientes tienen match en run_events?" ] }, { "cell_type": "code", "execution_count": null, "id": "a8b9c0d1", "metadata": {}, "outputs": [], "source": [ "# Cobertura combinando ambos joins (telefono + matricula)\n", "re_ids_phone = set(df_phone_all['re_id'].unique())\n", "re_ids_plate = set(df_plate_join['re_id'].unique())\n", "re_ids_any = re_ids_phone | re_ids_plate\n", "re_ids_both = re_ids_phone & re_ids_plate\n", "\n", "total_re = len(df_re)\n", "\n", "print(\"=== Cobertura de run_events ===\")\n", "print(f\"Total run_events: {total_re}\")\n", "print(f\"Match por telefono: {len(re_ids_phone)} ({100*len(re_ids_phone)/total_re:.1f}%)\")\n", "print(f\"Match por matricula: {len(re_ids_plate)} ({100*len(re_ids_plate)/total_re:.1f}%)\")\n", "print(f\"Match por cualquiera: {len(re_ids_any)} ({100*len(re_ids_any)/total_re:.1f}%)\")\n", "print(f\"Match por ambos: {len(re_ids_both)} ({100*len(re_ids_both)/total_re:.1f}%)\")\n", "print(f\"Sin match: {total_re - len(re_ids_any)} ({100*(total_re - len(re_ids_any))/total_re:.1f}%)\")" ] }, { "cell_type": "code", "execution_count": null, "id": "b9c0d1e2", "metadata": {}, "outputs": [], "source": [ "# Cobertura inversa: que % de call_transactions Mutua recientes tienen match\n", "df_ct_coverage = query_to_df(\"\"\"\n", "WITH re_phones AS (\n", " SELECT DISTINCT REPLACE(REPLACE(source, '+34', ''), ' ', '') AS phone_clean\n", " FROM `happy_robot_publicpublic.run_events`\n", " WHERE source IS NOT NULL\n", "),\n", "re_plates AS (\n", " SELECT DISTINCT UPPER(TRIM(license_plate)) AS plate\n", " FROM `happy_robot_publicpublic.run_events`\n", " WHERE license_plate IS NOT NULL AND TRIM(license_plate) != ''\n", "),\n", "ct AS (\n", " SELECT *,\n", " REPLACE(REPLACE(REPLACE(TRIM(telefonos), '*34', ''), '+34', ''), ' ', '') AS phone_clean,\n", " UPPER(TRIM(cliente)) AS plate_clean\n", " FROM `psql_dcpublic.call_transactions`\n", " WHERE LOWER(campaign_name) LIKE '%mutua%'\n", " AND date_time >= '2026-04-01'\n", ")\n", "SELECT\n", " COUNT(*) AS total_ct_mutua,\n", " COUNTIF(rp.phone_clean IS NOT NULL OR rpl.plate IS NOT NULL) AS matched,\n", " COUNTIF(rp.phone_clean IS NULL AND rpl.plate IS NULL) AS unmatched\n", "FROM ct\n", "LEFT JOIN re_phones rp ON ct.phone_clean = rp.phone_clean\n", "LEFT JOIN re_plates rpl ON ct.plate_clean = rpl.plate\n", "\"\"\")\n", "\n", "print(\"=== Cobertura de call_transactions Mutua (abril+) ===\")\n", "df_ct_coverage" ] }, { "cell_type": "code", "execution_count": null, "id": "c0d1e2f3", "metadata": {}, "outputs": [], "source": [ "# Visualizacion de cobertura\n", "coverage_data = pd.DataFrame({\n", " 'Metodo': ['Solo telefono', 'Solo matricula', 'Ambos', 'Sin match'],\n", " 'Count': [\n", " len(re_ids_phone - re_ids_plate),\n", " len(re_ids_plate - re_ids_phone),\n", " len(re_ids_both),\n", " total_re - len(re_ids_any)\n", " ]\n", "})\n", "\n", "fig = px.pie(coverage_data, values='Count', names='Metodo',\n", " title=f'Cobertura de run_events ({total_re} total)',\n", " color_discrete_sequence=px.colors.qualitative.Set2)\n", "fig.show()" ] }, { "cell_type": "markdown", "id": "d1e2f3a4", "metadata": {}, "source": [ "## 7. Comparacion de resultados\n", "\n", "Para los pares matcheados: comparar la clasificacion de Happy Robot vs la resolucion del call center.\n", "Las llamadas gestionadas por IA se resuelven de forma diferente?" ] }, { "cell_type": "code", "execution_count": null, "id": "e2f3a4b5", "metadata": {}, "outputs": [], "source": [ "# Join mas completo con ambos bridges para analisis de resultados\n", "df_matched = query_to_df(\"\"\"\n", "WITH re AS (\n", " SELECT *,\n", " REPLACE(REPLACE(source, '+34', ''), ' ', '') AS phone_clean\n", " FROM `happy_robot_publicpublic.run_events`\n", "),\n", "ct AS (\n", " SELECT *,\n", " REPLACE(REPLACE(REPLACE(TRIM(telefonos), '*34', ''), '+34', ''), ' ', '') AS phone_clean,\n", " UPPER(TRIM(cliente)) AS plate_clean\n", " FROM `psql_dcpublic.call_transactions`\n", " WHERE LOWER(campaign_name) LIKE '%mutua%'\n", " AND date_time >= '2026-04-01'\n", ")\n", "SELECT DISTINCT\n", " re.id AS re_id,\n", " re.run_id,\n", " re.time AS re_time,\n", " re.classification AS hr_classification,\n", " re.source AS hr_phone,\n", " re.license_plate AS hr_plate,\n", " ct.id AS ct_id,\n", " ct.date_time AS ct_time,\n", " ct.campaign_name,\n", " ct.description AS ct_resolution,\n", " ct.agente AS ct_agente,\n", " ct.t_convers AS ct_duration,\n", " CASE\n", " WHEN re.phone_clean = ct.phone_clean THEN 'telefono'\n", " WHEN UPPER(TRIM(re.license_plate)) = ct.plate_clean THEN 'matricula'\n", " END AS join_method\n", "FROM re\n", "JOIN ct ON (\n", " (re.phone_clean = ct.phone_clean)\n", " OR (UPPER(TRIM(re.license_plate)) = ct.plate_clean AND re.license_plate IS NOT NULL AND TRIM(re.license_plate) != '')\n", " )\n", " AND DATE(re.time) = DATE(ct.date_time)\n", "ORDER BY re.time\n", "\"\"\")\n", "\n", "print(f\"Pares matcheados (dedup): {len(df_matched)}\")\n", "print(f\"\\nClasificaciones Happy Robot en matches:\")\n", "print(df_matched['hr_classification'].value_counts())\n", "print(f\"\\nResoluciones call center en matches:\")\n", "print(df_matched['ct_resolution'].value_counts())" ] }, { "cell_type": "code", "execution_count": null, "id": "f3a4b5c6", "metadata": {}, "outputs": [], "source": [ "# Matriz de confusion: clasificacion HR vs resolucion CT\n", "if len(df_matched) > 0:\n", " confusion = pd.crosstab(\n", " df_matched['hr_classification'],\n", " df_matched['ct_resolution'],\n", " margins=True\n", " )\n", " print(\"Matriz: HR classification (filas) vs CT resolution (columnas)\")\n", " display(confusion)\n", "else:\n", " print(\"No hay pares matcheados para comparar\")" ] }, { "cell_type": "code", "execution_count": null, "id": "a4b5c6d7", "metadata": {}, "outputs": [], "source": [ "# Heatmap de la comparacion\n", "if len(df_matched) > 0:\n", " cross = pd.crosstab(df_matched['hr_classification'], df_matched['ct_resolution'])\n", " fig = px.imshow(cross,\n", " labels=dict(x='Resolucion Call Center', y='Clasificacion Happy Robot', color='Count'),\n", " title='Happy Robot Classification vs Call Center Resolution',\n", " text_auto=True,\n", " color_continuous_scale='Blues')\n", " fig.update_layout(width=900, height=500)\n", " fig.show()" ] }, { "cell_type": "markdown", "id": "b5c6d7e8", "metadata": {}, "source": [ "## 8. Timeline de una llamada\n", "\n", "Para un par matcheado, reconstruir el ciclo de vida completo:\n", "cuando entro en call_transactions, cuando la proceso Happy Robot, cual fue el resultado en cada sistema." ] }, { "cell_type": "code", "execution_count": null, "id": "c6d7e8f9", "metadata": {}, "outputs": [], "source": [ "# Seleccionar un par representativo (el primero con datos completos)\n", "if len(df_matched) > 0:\n", " sample = df_matched.iloc[0]\n", " print(f\"=== Timeline de llamada ===\")\n", " print(f\"run_event ID: {sample['re_id']}\")\n", " print(f\"call_transaction ID: {sample['ct_id']}\")\n", " print(f\"Metodo de join: {sample['join_method']}\")\n", " print(f\"\")\n", " print(f\"--- Happy Robot ---\")\n", " print(f\" Hora: {sample['re_time']}\")\n", " print(f\" Telefono: {sample['hr_phone']}\")\n", " print(f\" Matricula: {sample['hr_plate']}\")\n", " print(f\" Clasificacion: {sample['hr_classification']}\")\n", " print(f\" Run ID: {sample['run_id']}\")\n", " print(f\"\")\n", " print(f\"--- Call Center ---\")\n", " print(f\" Hora: {sample['ct_time']}\")\n", " print(f\" Campana: {sample['campaign_name']}\")\n", " print(f\" Resolucion: {sample['ct_resolution']}\")\n", " print(f\" Agente: {sample['ct_agente']}\")\n", " print(f\" Duracion conversacion: {sample['ct_duration']}s\")\n", "else:\n", " print(\"No hay pares matcheados\")" ] }, { "cell_type": "code", "execution_count": null, "id": "d7e8f9a0", "metadata": {}, "outputs": [], "source": [ "# Detalle completo del run_event seleccionado desde la BD\n", "if len(df_matched) > 0:\n", " sample_re_id = df_matched.iloc[0]['re_id']\n", " df_re_detail = query_to_df(f\"\"\"\n", " SELECT *\n", " FROM `happy_robot_publicpublic.run_events`\n", " WHERE id = '{sample_re_id}'\n", " \"\"\")\n", " print(\"Detalle completo del run_event:\")\n", " for col in df_re_detail.columns:\n", " print(f\" {col}: {df_re_detail.iloc[0][col]}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "e8f9a0b1", "metadata": {}, "outputs": [], "source": [ "# Timeline visual: todos los matches ordenados cronologicamente\n", "if len(df_matched) > 0:\n", " df_timeline = df_matched.copy()\n", " df_timeline['re_time'] = pd.to_datetime(df_timeline['re_time'])\n", " df_timeline['ct_time'] = pd.to_datetime(df_timeline['ct_time'])\n", " df_timeline['diff_minutes'] = (df_timeline['re_time'] - df_timeline['ct_time']).dt.total_seconds() / 60\n", "\n", " fig = px.scatter(df_timeline, x='re_time', y='diff_minutes',\n", " color='hr_classification',\n", " hover_data=['re_id', 'ct_id', 'hr_classification', 'ct_resolution'],\n", " title='Diferencia temporal: Happy Robot vs Call Center (minutos)',\n", " labels={'re_time': 'Hora Happy Robot', 'diff_minutes': 'Diferencia (min)'})\n", " fig.add_hline(y=0, line_dash='dash', line_color='gray')\n", " fig.show()\n", "\n", " print(f\"\\nDiferencia temporal media: {df_timeline['diff_minutes'].mean():.1f} min\")\n", " print(f\"Diferencia temporal mediana: {df_timeline['diff_minutes'].median():.1f} min\")" ] }, { "cell_type": "markdown", "id": "f9a0b1c2", "metadata": {}, "source": [ "## 9. Metricas cruzadas\n", "\n", "Para llamadas matcheadas:\n", "- Comparacion de duracion: `run_events.duration` vs `call_transactions.t_convers`\n", "- Tasa de exito/resolucion en cada sistema" ] }, { "cell_type": "code", "execution_count": null, "id": "a0b1c2d3a", "metadata": {}, "outputs": [], "source": [ "# Comparar duraciones donde ambas estan disponibles\n", "df_durations = query_to_df(\"\"\"\n", "WITH re AS (\n", " SELECT *,\n", " REPLACE(REPLACE(source, '+34', ''), ' ', '') AS phone_clean\n", " FROM `happy_robot_publicpublic.run_events`\n", "),\n", "ct AS (\n", " SELECT *,\n", " REPLACE(REPLACE(REPLACE(TRIM(telefonos), '*34', ''), '+34', ''), ' ', '') AS phone_clean\n", " FROM `psql_dcpublic.call_transactions`\n", " WHERE LOWER(campaign_name) LIKE '%mutua%'\n", " AND date_time >= '2026-04-01'\n", ")\n", "SELECT\n", " re.id AS re_id,\n", " re.duration AS hr_duration,\n", " ct.t_convers AS ct_duration,\n", " re.classification AS hr_classification,\n", " ct.description AS ct_resolution\n", "FROM re\n", "JOIN ct ON re.phone_clean = ct.phone_clean\n", " AND DATE(re.time) = DATE(ct.date_time)\n", "WHERE re.duration IS NOT NULL AND ct.t_convers IS NOT NULL\n", "ORDER BY re.time\n", "\"\"\")\n", "\n", "if len(df_durations) > 0:\n", " df_durations['hr_duration'] = pd.to_numeric(df_durations['hr_duration'], errors='coerce')\n", " df_durations['ct_duration'] = pd.to_numeric(df_durations['ct_duration'], errors='coerce')\n", "\n", " print(f\"Pares con duracion en ambos sistemas: {len(df_durations)}\")\n", " print(f\"\\nDuracion Happy Robot (seg):\")\n", " print(df_durations['hr_duration'].describe())\n", " print(f\"\\nDuracion Call Center (seg):\")\n", " print(df_durations['ct_duration'].describe())\n", "else:\n", " print(\"No hay pares con duracion en ambos sistemas\")" ] }, { "cell_type": "code", "execution_count": null, "id": "b1c2d3e4a", "metadata": {}, "outputs": [], "source": [ "# Scatter: duracion HR vs duracion CT\n", "if len(df_durations) > 0 and df_durations['hr_duration'].notna().sum() > 0:\n", " fig = px.scatter(df_durations, x='hr_duration', y='ct_duration',\n", " color='hr_classification',\n", " hover_data=['re_id', 'ct_resolution'],\n", " title='Duracion: Happy Robot vs Call Center (segundos)',\n", " labels={'hr_duration': 'Happy Robot (seg)', 'ct_duration': 'Call Center (seg)'})\n", " # Linea diagonal (misma duracion)\n", " max_dur = max(df_durations['hr_duration'].max(), df_durations['ct_duration'].max())\n", " fig.add_trace(go.Scatter(x=[0, max_dur], y=[0, max_dur],\n", " mode='lines', line=dict(dash='dash', color='gray'),\n", " name='Igual duracion'))\n", " fig.show()\n", "else:\n", " print(\"Datos de duracion insuficientes para scatter\")" ] }, { "cell_type": "code", "execution_count": null, "id": "c2d3e4f5a", "metadata": {}, "outputs": [], "source": [ "# Tasa de exito por sistema\n", "if len(df_matched) > 0:\n", " print(\"=== Tasas de resolucion en pares matcheados ===\")\n", " print(f\"\\nHappy Robot - clasificaciones:\")\n", " hr_counts = df_matched['hr_classification'].value_counts()\n", " for cls, count in hr_counts.items():\n", " print(f\" {cls}: {count} ({100*count/len(df_matched):.1f}%)\")\n", "\n", " print(f\"\\nCall Center - resoluciones:\")\n", " ct_counts = df_matched['ct_resolution'].value_counts()\n", " for res, count in ct_counts.items():\n", " print(f\" {res}: {count} ({100*count/len(df_matched):.1f}%)\")\n", "else:\n", " print(\"No hay pares matcheados\")" ] }, { "cell_type": "code", "execution_count": null, "id": "d3e4f5a6a", "metadata": {}, "outputs": [], "source": [ "# Distribucion comparada de clasificaciones\n", "if len(df_matched) > 0:\n", " fig = go.Figure()\n", "\n", " hr_vc = df_matched['hr_classification'].value_counts()\n", " ct_vc = df_matched['ct_resolution'].value_counts().head(10)\n", "\n", " fig.add_trace(go.Bar(name='Happy Robot', x=hr_vc.index, y=hr_vc.values))\n", "\n", " fig.update_layout(\n", " title='Distribucion de clasificaciones - Happy Robot (pares matcheados)',\n", " xaxis_title='Clasificacion', yaxis_title='Count',\n", " barmode='group'\n", " )\n", " fig.show()\n", "\n", " fig2 = go.Figure()\n", " fig2.add_trace(go.Bar(name='Call Center', x=ct_vc.index, y=ct_vc.values,\n", " marker_color='coral'))\n", " fig2.update_layout(\n", " title='Distribucion de resoluciones - Call Center (pares matcheados, top 10)',\n", " xaxis_title='Resolucion', yaxis_title='Count'\n", " )\n", " fig2.show()" ] }, { "cell_type": "markdown", "id": "e4f5a6b7a", "metadata": {}, "source": [ "## 10. Resumen y hallazgos" ] }, { "cell_type": "code", "execution_count": null, "id": "f5a6b7c8a", "metadata": {}, "outputs": [], "source": [ "# Resumen final cuantitativo\n", "print(\"=\"*60)\n", "print(\"RESUMEN: EDA Combinado Happy Robot + Call Transactions\")\n", "print(\"=\"*60)\n", "print(f\"\\n1. VOLUMENES\")\n", "print(f\" run_events (Happy Robot): {total_re} filas\")\n", "print(f\" call_transactions Mutua (abril+): ver celda 3\")\n", "print(f\"\")\n", "print(f\"2. COBERTURA\")\n", "print(f\" Match por telefono: {len(re_ids_phone)} run_events ({100*len(re_ids_phone)/total_re:.1f}%)\")\n", "print(f\" Match por matricula: {len(re_ids_plate)} run_events ({100*len(re_ids_plate)/total_re:.1f}%)\")\n", "print(f\" Match por cualquier metodo: {len(re_ids_any)} run_events ({100*len(re_ids_any)/total_re:.1f}%)\")\n", "print(f\" Sin match: {total_re - len(re_ids_any)} run_events ({100*(total_re - len(re_ids_any))/total_re:.1f}%)\")\n", "print(f\"\")\n", "print(f\"3. JOIN BRIDGES\")\n", "print(f\" Telefono: source(+34...) <-> telefonos/orig, normalizado, mismo dia\")\n", "print(f\" Matricula: license_plate <-> cliente, mismo dia\")\n", "print(f\" Filtro: solo campanas Mutua en call_transactions\")\n", "print(f\"\")\n", "print(f\"4. PARES MATCHEADOS: {len(df_matched)}\")\n", "if len(df_matched) > 0:\n", " print(f\" Clasificaciones HR: {df_matched['hr_classification'].value_counts().to_dict()}\")\n", " print(f\" Resoluciones CT (top 5): {df_matched['ct_resolution'].value_counts().head(5).to_dict()}\")" ] }, { "cell_type": "markdown", "id": "a6b7c8d9a", "metadata": {}, "source": [ "### Hallazgos clave\n", "\n", "**Pendiente de rellenar tras ejecucion:**\n", "\n", "1. **Tasa de matching**: X% de las llamadas de Happy Robot se encuentran en call_transactions\n", "2. **Metodo de join mas efectivo**: telefono vs matricula\n", "3. **Consistencia de resultados**: como se comparan las clasificaciones del agente IA con las resoluciones humanas\n", "4. **Diferencia temporal**: cuanto tiempo pasa entre el registro en cada sistema\n", "5. **Duracion comparada**: llamadas del agente IA son mas cortas/largas que las humanas?\n", "6. **Campanas relevantes**: que campanas Mutua especificas solapan con Happy Robot\n", "\n", "### Proximos pasos\n", "\n", "- Ampliar ventana temporal si el matching es bajo (probar +/- 1 dia en vez de mismo dia)\n", "- Analizar las llamadas sin match: por que no aparecen en call_transactions?\n", "- Comparar con campanas no-Mutua por si hay llamadas mal clasificadas\n", "- Evaluar si Happy Robot reduce la carga del call center o genera llamadas duplicadas" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.5" } }, "nbformat": 4, "nbformat_minor": 5 }