89730911c2
Tracking de issues completados (jupyter tools) y funciones implementadas (specs de diseño ya resueltas).
3.8 KiB
3.8 KiB
Funciones a implementar desde OpenViking
Fuente: sources/OpenViking (ByteDance)
Licencia principal: AGPL-3.0 (NO permisiva)
Licencia bot/: MIT | Licencia examples/: Apache 2.0
Estrategia: No copiar codigo. Reimplementar desde cero las funcionalidades genericas y utiles, documentadas aqui como specs independientes. Las funciones resultantes seran originales, sin dependencia alguna del codigo AGPL.
Funciones
| # | Archivo | Dominio | Funciones |
|---|---|---|---|
| 01 | parse_markdown.md | core | extract_frontmatter, find_headings, smart_split_content, estimate_token_count, sanitize_for_path |
| 02 | parse_pdf_to_markdown.md | core | pdf_to_markdown, extract_pdf_bookmarks, detect_headings_by_font, format_table_to_markdown |
| 03 | parse_html_to_markdown.md | core | html_to_markdown, detect_url_type, fetch_and_parse_url, convert_github_to_raw_url |
| 04 | parse_docx_to_markdown.md | core | docx_to_markdown |
| 05 | parse_excel_to_markdown.md | core | excel_to_markdown |
| 06 | parse_epub_to_markdown.md | core | epub_to_markdown |
| 07 | directory_scanner.md | infra | scan_directory |
| 08 | circuit_breaker.md | core | CircuitBreaker (class) |
| 09 | retry_with_classify.md | core | classify_api_error, compute_backoff_delay, retry_sync, retry_async |
| 10 | hotness_score.md | datascience | hotness_score |
| 11 | time_utils.md | core | parse_iso_datetime, format_iso8601, format_simplified |
| 12 | safe_extract_zip.md | infra | safe_extract_zip, normalize_zip_filenames |
| 13 | envelope_encryption.md | cybersecurity | envelope_encrypt, envelope_decrypt |
| 14 | parse_code_ast.md | core | parse_code_ast (tree-sitter multi-language) |
| 15 | git_url_parser.md | core | parse_git_url, is_git_repo_url, validate_git_ssh_uri |
| 16 | media_strategy.md | core | calculate_media_strategy |
| 17 | parser_registry.md | core | ParserRegistry (patron extensible) |
| 18 | read_file_with_encoding.md | infra | read_file_with_encoding |
Tipos
| # | Archivo | Dominio | Tipos |
|---|---|---|---|
| 19 | type_parse_result.md | core | NodeType (sum), ResourceNode (product), ParseResult (product) |
| 20 | type_classified_file.md | infra | ClassifiedFile (product), DirectoryScanResult (product) |
| 21 | type_code_entity.md | core | CodeEntity (product) |
| 22 | type_message.md | core | TextPart, ContextPart, ToolPart, Part (sum), Message (product) |
| 23 | type_retrieval.md | core | ContextType (sum), TypedQuery, QueryPlan, MatchedContext, ScoreDistribution, QueryResult, FindResult |
| 24 | type_memory.md | core | FieldType (sum), MergeOp (sum), MemoryField, MemoryTypeSchema, MemoryData |
| 25 | type_context.md | core | ResourceContentType (sum), ContextLevel (sum), Context (product) |
Gaps identificados (specs propias)
Funciones que faltan en el registry para completar capacidades de ingesta, operaciones y automatizacion. Diseño original, sin fuente externa.
Funciones
| # | Archivo | Dominio | Funciones |
|---|---|---|---|
| 26 | validation_schemas.md | core | validate_json_schema, validate_struct_fields, coerce_types |
| 27 | tabular_transforms.md | datascience/core | pivot, melt, join_by_key, aggregate_by_group |
| 28 | serialization_format.md | core | to_csv, from_csv, to_jsonl, from_jsonl, render_template, generate_html_report |
| 29 | http_client.md | infra | http_get_json, http_post_json, http_download_file |
| 30 | scheduling.md | core/infra | parse_cron_expr, next_cron_time, cron_ticker |
| 31 | cache_persistent.md | infra/core | cache_to_sqlite, cache_to_file, cache_decorator |
| 32 | diff_merge.md | datascience | diff_entities, diff_relations, detect_drift, merge_graphs |
Tipos
| # | Archivo | Dominio | Tipos |
|---|---|---|---|
| 30 | scheduling.md | core | CronSchedule (product) |