HAL 9000
See, hear, think, speak, act — all from your machine.

About this build
A cross-platform, open-source multimodal AI agent that sees via webcam, hears your voice, thinks via LLM (GPT-4o, Claude, Gemini, Ollama), speaks with a cloned voice, and acts through 43 OS-level tools. Free mode with Ollama means zero API keys. Works on macOS, Windows, and Linux.
Built with
- Python
- FastAPI
- OpenCV
- PyAudio
- Ollama
- Claude
- GPT-4o
- Gemini