Files
fn_registry/python/functions/cybersecurity/detect_web_tech_test.py
T
egutierrez 935008ec3f feat(recon): grupo de reconocimiento de red + servicios + fingerprint web
Añade el capability group `recon` (dominio cybersecurity + pipelines, Python),
con la política de archivado OSINT y página madre docs/capabilities/recon.md.

Lookups y sondeo (wrappers de CLI):
- whois_lookup, rdap_lookup, dns_records, ping_host, traceroute_host, nmap_scan
- save_scan_to_osint (sink común) + recon_osint (pipeline one-shot scan+archivado)

Escaneo de puertos/servicios nativo (stdlib, sin nmap ni sudo):
- scan_tcp_ports: connect-scan TCP concurrente (open/closed/filtered)
- grab_service_banner: banner grab + identificación de servicio/versión real
- identify_port_service: puro, puerto -> servicio IANA esperado (~120 puertos)
- scan_port_services: pipeline one-shot (scan -> identify + banner por puerto abierto)

Fingerprint de tecnología web (estilo Wappalyzer), patrón pura/impura:
- fetch_http_fingerprint: GET stdlib, recoge headers/html/cookies (solo nombres)
- detect_web_tech: puro, matchea ~50 firmas regex -> tecnologías por categoría
- fingerprint_web_stack: pipeline one-shot url -> tecnologías

Todas devuelven dict {status} sin lanzar. Tests: 43 verdes, sin red externa.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 15:12:07 +02:00

109 lines
3.4 KiB
Python

"""Tests para detect_web_tech (deteccion de tecnologia web estilo Wappalyzer)."""
from detect_web_tech import detect_web_tech
def _names(result):
return {t["name"] for t in result["technologies"]}
def _by_name(result, name):
for t in result["technologies"]:
if t["name"] == name:
return t
return None
def test_nginx_por_header_con_version():
result = detect_web_tech({"server": "nginx/1.24.0"})
assert "nginx" in _names(result)
nginx = _by_name(result, "nginx")
assert nginx["version"] == "1.24.0"
assert nginx["category"] == "web-server"
assert nginx["confidence"] == "high"
def test_wordpress_por_html_y_meta_implica_php():
html = (
'<html><head>'
'<meta name="generator" content="WordPress 6.4">'
'</head><body>'
'<link href="/wp-content/themes/x/style.css">'
'</body></html>'
)
result = detect_web_tech({}, html=html)
names = _names(result)
assert "WordPress" in names
assert "PHP" in names # implied
wp = _by_name(result, "WordPress")
assert wp["version"] == "6.4"
assert wp["confidence"] == "high"
php = _by_name(result, "PHP")
assert php["confidence"] == "medium"
assert "implied by WordPress" in php["evidence"]
def test_php_por_cookie():
result = detect_web_tech({}, cookies=["PHPSESSID"])
php = _by_name(result, "PHP")
assert php is not None
assert php["category"] == "programming-language"
assert php["confidence"] == "high"
assert "PHPSESSID" in php["evidence"]
def test_cloudflare_por_header():
result = detect_web_tech({"server": "cloudflare", "cf-ray": "8a1b2c3d4e5f-MAD"})
cf = _by_name(result, "Cloudflare")
assert cf is not None
assert cf["category"] == "cdn"
assert cf["confidence"] == "high"
def test_entrada_vacia():
result = detect_web_tech({})
assert result["technologies"] == []
assert result["by_category"] == {}
assert result["count"] == 0
def test_entrada_vacia_explicita_headers_y_html():
result = detect_web_tech({}, html="", cookies=None, final_url="")
assert result["count"] == 0
assert result["technologies"] == []
def test_determinismo():
headers = {"server": "nginx/1.24.0", "x-powered-by": "PHP/8.2"}
html = '<meta name="generator" content="WordPress 6.4">wp-content'
a = detect_web_tech(headers, html=html, cookies=["PHPSESSID"])
b = detect_web_tech(headers, html=html, cookies=["PHPSESSID"])
assert a == b
def test_count_y_by_category_consistentes():
headers = {"server": "nginx/1.24.0"}
html = '<meta name="generator" content="WordPress 6.4">wp-content'
result = detect_web_tech(headers, html=html)
assert result["count"] == len(result["technologies"])
total_in_categories = sum(len(v) for v in result["by_category"].values())
assert total_in_categories == result["count"]
assert "nginx" in result["by_category"]["web-server"]
assert "WordPress" in result["by_category"]["cms"]
def test_headers_claves_mayusculas_se_normalizan():
result = detect_web_tech({"Server": "Apache/2.4.57"})
apache = _by_name(result, "Apache")
assert apache is not None
assert apache["version"] == "2.4.57"
def test_jquery_por_script_src_es_medium():
html = '<script src="/static/jquery-3.6.0.min.js"></script>'
result = detect_web_tech({}, html=html)
jq = _by_name(result, "jQuery")
assert jq is not None
assert jq["confidence"] == "medium"
assert jq["version"] == "3.6.0"