chore: añade directorio dev/ con issues y funciones implementadas

Tracking de issues completados (jupyter tools) y funciones implementadas (specs de diseño ya resueltas).
This commit is contained in:
2026-04-05 18:19:36 +02:00
parent 806c819cf7
commit a9cd28b010
64 changed files with 3680 additions and 0 deletions
+67
View File
@@ -0,0 +1,67 @@
# Funciones a implementar desde OpenViking
Fuente: `sources/OpenViking` (ByteDance)
Licencia principal: **AGPL-3.0** (NO permisiva)
Licencia bot/: **MIT** | Licencia examples/: **Apache 2.0**
**Estrategia:** No copiar codigo. Reimplementar desde cero las funcionalidades
genericas y utiles, documentadas aqui como specs independientes. Las funciones
resultantes seran originales, sin dependencia alguna del codigo AGPL.
## Funciones
| # | Archivo | Dominio | Funciones |
|---|---------|---------|-----------|
| 01 | parse_markdown.md | core | extract_frontmatter, find_headings, smart_split_content, estimate_token_count, sanitize_for_path |
| 02 | parse_pdf_to_markdown.md | core | pdf_to_markdown, extract_pdf_bookmarks, detect_headings_by_font, format_table_to_markdown |
| 03 | parse_html_to_markdown.md | core | html_to_markdown, detect_url_type, fetch_and_parse_url, convert_github_to_raw_url |
| 04 | parse_docx_to_markdown.md | core | docx_to_markdown |
| 05 | parse_excel_to_markdown.md | core | excel_to_markdown |
| 06 | parse_epub_to_markdown.md | core | epub_to_markdown |
| 07 | directory_scanner.md | infra | scan_directory |
| 08 | circuit_breaker.md | core | CircuitBreaker (class) |
| 09 | retry_with_classify.md | core | classify_api_error, compute_backoff_delay, retry_sync, retry_async |
| 10 | hotness_score.md | datascience | hotness_score |
| 11 | time_utils.md | core | parse_iso_datetime, format_iso8601, format_simplified |
| 12 | safe_extract_zip.md | infra | safe_extract_zip, normalize_zip_filenames |
| 13 | envelope_encryption.md | cybersecurity | envelope_encrypt, envelope_decrypt |
| 14 | parse_code_ast.md | core | parse_code_ast (tree-sitter multi-language) |
| 15 | git_url_parser.md | core | parse_git_url, is_git_repo_url, validate_git_ssh_uri |
| 16 | media_strategy.md | core | calculate_media_strategy |
| 17 | parser_registry.md | core | ParserRegistry (patron extensible) |
| 18 | read_file_with_encoding.md | infra | read_file_with_encoding |
## Tipos
| # | Archivo | Dominio | Tipos |
|---|---------|---------|-------|
| 19 | type_parse_result.md | core | NodeType (sum), ResourceNode (product), ParseResult (product) |
| 20 | type_classified_file.md | infra | ClassifiedFile (product), DirectoryScanResult (product) |
| 21 | type_code_entity.md | core | CodeEntity (product) |
| 22 | type_message.md | core | TextPart, ContextPart, ToolPart, Part (sum), Message (product) |
| 23 | type_retrieval.md | core | ContextType (sum), TypedQuery, QueryPlan, MatchedContext, ScoreDistribution, QueryResult, FindResult |
| 24 | type_memory.md | core | FieldType (sum), MergeOp (sum), MemoryField, MemoryTypeSchema, MemoryData |
| 25 | type_context.md | core | ResourceContentType (sum), ContextLevel (sum), Context (product) |
## Gaps identificados (specs propias)
Funciones que faltan en el registry para completar capacidades de ingesta,
operaciones y automatizacion. Diseño original, sin fuente externa.
### Funciones
| # | Archivo | Dominio | Funciones |
|---|---------|---------|-----------|
| 26 | validation_schemas.md | core | validate_json_schema, validate_struct_fields, coerce_types |
| 27 | tabular_transforms.md | datascience/core | pivot, melt, join_by_key, aggregate_by_group |
| 28 | serialization_format.md | core | to_csv, from_csv, to_jsonl, from_jsonl, render_template, generate_html_report |
| 29 | http_client.md | infra | http_get_json, http_post_json, http_download_file |
| 30 | scheduling.md | core/infra | parse_cron_expr, next_cron_time, cron_ticker |
| 31 | cache_persistent.md | infra/core | cache_to_sqlite, cache_to_file, cache_decorator |
| 32 | diff_merge.md | datascience | diff_entities, diff_relations, detect_drift, merge_graphs |
### Tipos
| # | Archivo | Dominio | Tipos |
|---|---------|---------|-------|
| 30 | scheduling.md | core | CronSchedule (product) |