--- name: fuzzy_merge_adaptive kind: function lang: py domain: datascience version: "1.0.0" purity: pure signature: "def fuzzy_merge_adaptive(left: list[dict], right: list[dict], left_key: str, right_key: str, thresholds: list[int] | None = None, how: str = 'left') -> list[dict]" description: "Fuzzy join adaptativo entre dos listas de dicts usando rapidfuzz.token_sort_ratio. Prueba thresholds de mayor a menor y asigna el mayor cumplido. Soporta how='left' (todos los de left) e how='inner' (solo con match). Campos colisionantes reciben sufijos _left/_right." tags: [fuzzy, matching, join, merge, rapidfuzz, string-similarity, datascience] params: - name: left desc: Lista de dicts (lado izquierdo del join). - name: right desc: Lista de dicts (lado derecho del join). - name: left_key desc: Clave en los dicts de left usada para matching de strings. - name: right_key desc: Clave en los dicts de right usada para matching de strings. - name: thresholds desc: Lista de thresholds enteros a probar en orden descendente. Default [90,80,70,60,50]. - name: how desc: "'left' incluye todos los items de left; 'inner' solo los que tienen match." output: "Lista de dicts mergeados con campos de left + right (sufijos _left/_right si colisionan) + fuzzy_match (str|None), match_score (int), threshold_used (int|None)." uses_functions: [] uses_types: [] returns: [] returns_optional: false error_type: "" imports: ["rapidfuzz"] tested: true tests: - "left join con typo" - "inner join excluye sin match" - "left join sin match devuelve none" - "threshold adaptativo" - "colision de claves usa sufijos" test_file_path: "python/functions/datascience/tests/test_fuzzy_merge_adaptive.py" file_path: "python/functions/datascience/fuzzy_merge_adaptive.py" source_repo: "internal:footprint_aurgi" source_license: "internal-aurgi" source_file: "fuzzy_joins/fuzzy_en_batches.py" --- ## Ejemplo ```python from fuzzy_merge_adaptive import fuzzy_merge_adaptive left = [{"name": "Madrid"}, {"name": "Barclona"}] right = [{"name": "Madrid", "cp": "28"}, {"name": "Barcelona", "cp": "08"}] result = fuzzy_merge_adaptive(left, right, left_key="name", right_key="name") # result[1]["fuzzy_match"] == "Barcelona", result[1]["match_score"] >= 80 ``` ## Notas Migrado de thefuzz a rapidfuzz (API compatible, mayor velocidad). Sin pandas: el merge se implementa manualmente via dict lookup por right_key. Los thresholds se prueban de mayor a menor; el primero cumplido se asigna a threshold_used.