Files
fn_registry/dev/functions_to_implement/00_README.md
T
egutierrez a9cd28b010 chore: añade directorio dev/ con issues y funciones implementadas
Tracking de issues completados (jupyter tools) y funciones implementadas (specs de diseño ya resueltas).
2026-04-05 18:19:36 +02:00

3.8 KiB

Funciones a implementar desde OpenViking

Fuente: sources/OpenViking (ByteDance) Licencia principal: AGPL-3.0 (NO permisiva) Licencia bot/: MIT | Licencia examples/: Apache 2.0

Estrategia: No copiar codigo. Reimplementar desde cero las funcionalidades genericas y utiles, documentadas aqui como specs independientes. Las funciones resultantes seran originales, sin dependencia alguna del codigo AGPL.

Funciones

# Archivo Dominio Funciones
01 parse_markdown.md core extract_frontmatter, find_headings, smart_split_content, estimate_token_count, sanitize_for_path
02 parse_pdf_to_markdown.md core pdf_to_markdown, extract_pdf_bookmarks, detect_headings_by_font, format_table_to_markdown
03 parse_html_to_markdown.md core html_to_markdown, detect_url_type, fetch_and_parse_url, convert_github_to_raw_url
04 parse_docx_to_markdown.md core docx_to_markdown
05 parse_excel_to_markdown.md core excel_to_markdown
06 parse_epub_to_markdown.md core epub_to_markdown
07 directory_scanner.md infra scan_directory
08 circuit_breaker.md core CircuitBreaker (class)
09 retry_with_classify.md core classify_api_error, compute_backoff_delay, retry_sync, retry_async
10 hotness_score.md datascience hotness_score
11 time_utils.md core parse_iso_datetime, format_iso8601, format_simplified
12 safe_extract_zip.md infra safe_extract_zip, normalize_zip_filenames
13 envelope_encryption.md cybersecurity envelope_encrypt, envelope_decrypt
14 parse_code_ast.md core parse_code_ast (tree-sitter multi-language)
15 git_url_parser.md core parse_git_url, is_git_repo_url, validate_git_ssh_uri
16 media_strategy.md core calculate_media_strategy
17 parser_registry.md core ParserRegistry (patron extensible)
18 read_file_with_encoding.md infra read_file_with_encoding

Tipos

# Archivo Dominio Tipos
19 type_parse_result.md core NodeType (sum), ResourceNode (product), ParseResult (product)
20 type_classified_file.md infra ClassifiedFile (product), DirectoryScanResult (product)
21 type_code_entity.md core CodeEntity (product)
22 type_message.md core TextPart, ContextPart, ToolPart, Part (sum), Message (product)
23 type_retrieval.md core ContextType (sum), TypedQuery, QueryPlan, MatchedContext, ScoreDistribution, QueryResult, FindResult
24 type_memory.md core FieldType (sum), MergeOp (sum), MemoryField, MemoryTypeSchema, MemoryData
25 type_context.md core ResourceContentType (sum), ContextLevel (sum), Context (product)

Gaps identificados (specs propias)

Funciones que faltan en el registry para completar capacidades de ingesta, operaciones y automatizacion. Diseño original, sin fuente externa.

Funciones

# Archivo Dominio Funciones
26 validation_schemas.md core validate_json_schema, validate_struct_fields, coerce_types
27 tabular_transforms.md datascience/core pivot, melt, join_by_key, aggregate_by_group
28 serialization_format.md core to_csv, from_csv, to_jsonl, from_jsonl, render_template, generate_html_report
29 http_client.md infra http_get_json, http_post_json, http_download_file
30 scheduling.md core/infra parse_cron_expr, next_cron_time, cron_ticker
31 cache_persistent.md infra/core cache_to_sqlite, cache_to_file, cache_decorator
32 diff_merge.md datascience diff_entities, diff_relations, detect_drift, merge_graphs

Tipos

# Archivo Dominio Tipos
30 scheduling.md core CronSchedule (product)