chore: dependencias Python, sources manifest, reglas de extracción y comando extract-source
Actualiza pyproject.toml con nuevas dependencias (pdfplumber, python-docx, ebooklib, openpyxl, etc.). Actualiza sources.yaml con funciones extraídas de repos externos. Mejora reglas de extracción en sources.md. Añade comando Claude extract-source para workflow de extracción. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -18,6 +18,71 @@
|
||||
# 5. Actualizar este manifest con las funciones extraidas
|
||||
|
||||
repos:
|
||||
- repo: https://github.com/VectifyAI/PageIndex
|
||||
license: MIT
|
||||
cloned_dir: PageIndex
|
||||
extracted:
|
||||
# Pure — tree manipulation (8)
|
||||
- id: flatten_tree_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
- id: tree_to_flat_list_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
- id: get_leaf_nodes_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
- id: write_node_ids_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
- id: list_to_tree_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
- id: remove_tree_fields_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
- id: format_tree_structure_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
- id: create_node_mapping_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
# Pure — text/JSON extraction (2)
|
||||
- id: extract_json_from_llm_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
- id: parse_page_range_py_core
|
||||
source_file: pageindex/retrieve.py
|
||||
date: 2026-04-05
|
||||
# Pure — markdown parsing (2)
|
||||
- id: extract_markdown_headers_py_core
|
||||
source_file: pageindex/page_index_md.py
|
||||
date: 2026-04-05
|
||||
- id: build_tree_from_headers_py_core
|
||||
source_file: pageindex/page_index_md.py
|
||||
date: 2026-04-05
|
||||
# Pure — pagination/chunking (2)
|
||||
- id: page_list_to_groups_py_core
|
||||
source_file: pageindex/page_index.py
|
||||
date: 2026-04-05
|
||||
- id: calculate_page_offset_py_core
|
||||
source_file: pageindex/page_index.py
|
||||
date: 2026-04-05
|
||||
# Impure — LLM wrappers (2)
|
||||
- id: llm_completion_retry_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
- id: llm_acompletion_retry_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
# Impure — PDF extraction (2)
|
||||
- id: extract_pdf_text_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
- id: get_pdf_page_tokens_py_core
|
||||
source_file: pageindex/utils.py
|
||||
date: 2026-04-05
|
||||
|
||||
- repo: https://gitea-dgg044oo04woo4ggcsws4gk0.organic-machine.com/Bl4cksmith/Frontend_Library
|
||||
license: MIT
|
||||
cloned_dir: frontend_library
|
||||
|
||||
Reference in New Issue
Block a user