No description

Find a file

leanderkretschmer e3d8a2fb64 added compatibility layer		2026-03-13 21:23:02 +01:00
ai-service	added compatibility layer	2026-03-13 21:23:02 +01:00
backend	added compatibility layer	2026-03-13 21:23:02 +01:00
.gitignore	Initial WHAgent platform with WhatsApp, AI pipeline and admin dashboard	2026-03-13 21:10:42 +01:00
docker-compose.gpu.yml	added compatibility layer	2026-03-13 21:23:02 +01:00
docker-compose.yml	added compatibility layer	2026-03-13 21:23:02 +01:00
MODELS.md	added compatibility layer	2026-03-13 21:23:02 +01:00
README.md	added compatibility layer	2026-03-13 21:23:02 +01:00
start-local.sh	added compatibility layer	2026-03-13 21:23:02 +01:00

README.md

WHAgent

Multi-Agent WhatsApp-Plattform mit Admin-Dashboard, Node-basierten Integrationen, Ollama-LLM, STT, OCR, TTS und persistenter Wissensbasis.

Features

Admin-Dashboard unter http://localhost:3000 zum Anlegen von Agenten
Pro Agent beliebige Nodes (Redmine, Kalender, Mail, Custom)
WhatsApp-Anbindung pro Agent über QR-Code Login
Agent-Chats in WhatsApp (inkl. Gruppen/Einzelchat)
Redmine-Statuswechsel über Textbefehle wie ticket #45878 auf angelegt stellen
Erinnerungen mit Zielort, Fahrzeitberechnung und dynamischer Rekalkulation
Standort-Rekalibrierung bei >100m Veränderung innerhalb von 5 Minuten
STT für Sprachnachrichten
OCR für Bilder
TTS für Sprachantworten (Voice Notes)
Dauerhafte Wissensbasis (Memories, Logs, Reminders) in Postgres

Technische Architektur

backend: Node.js, Express, whatsapp-web.js, Scheduler, REST API, Dashboard-Hosting
ai-service: FastAPI mit Faster-Whisper (STT), EasyOCR+Tesseract (OCR), Coqui-TTS (TTS)
ollama: LLM-Inferenz
postgres: Persistenz für Agenten, Nodes, Wissensbasis, Flow-Events

GPU-Modellvorschlag (RTX 5070 12GB, Deutsch)

LLM: qwen2.5:7b-instruct-q4_K_M (Ollama, robustes Verständnis/Antworten)
STT: large-v3 via Faster-Whisper (deutsch stark, GPU-geeignet)
OCR: EasyOCR de,en + Tesseract deu+eng
TTS: tts_models/de/thorsten/tacotron2-DCA

Diese Auswahl ist auf 12GB ausgelegt, wenn Workloads primär nacheinander laufen.

Start

./start-local.sh

Danach:

Dashboard öffnen: http://localhost:3000
Agent anlegen
Agent verbinden
QR-Code scannen
Optional Redmine-Node hinterlegen (JSON mit baseUrl, apiKey, statusMap)

Kompatibilitätsmodus (Macbook / ohne CUDA)

start-local.sh erkennt automatisch NVIDIA/CUDA.
Mit CUDA: startet Compose mit docker-compose.gpu.yml und nutzt GPU-Modelle.
Ohne CUDA: startet Standard-Compose im CPU-Fallback.
AI-Service wählt automatisch:
- STT GPU: large-v3 (float16)
- STT CPU: distil-large-v3 (int8)
Backend wählt automatisch das Ollama-Modell:
- GPU: qwen2.5:7b-instruct-q4_K_M
- CPU: qwen2.5:3b-instruct-q4_K_M

Hinweise zu WhatsApp-Anrufen

WhatsApp-Web-Bibliotheken erlauben kein verlässliches Initiieren klassischer Telefonanrufe. Das Projekt implementiert stattdessen einen Sprachmodus per Voice Note (TTS-Ausgabe + STT-Eingabe), der den gleichen Gesprächsfluss ermöglicht.