⚙️ Cogs Settings
Configure AI services, voice, and API settings
PersonaPlex is full-duplex voice AI (ASR+LLM+TTS in one, 70ms latency). Kokoro is high-quality local TTS. Browser Web Speech works on all devices.
Natural F0
Warm conversational
Natural F1
Clear professional
Natural F2
Friendly casual
Natural M2
Clear articulate
Variable F0
Expressive female
Variable F1
Dynamic female
Variable M0
Expressive male
PersonaPlex 7B — full-duplex voice AI. Handles speech recognition, thinking, and speaking simultaneously. 70ms latency, natural interruptions. Requires GPU.
Bella
Warm female (default)
Jessica
Professional female
27 high-quality neural voices, runs locally on GPU - zero API cost
Available voices depend on your browser/OS
Lessac
Natural male (default)
HFC Male
High-fidelity male
HFC Female
High-fidelity female
Northern Male
Northern accent
Southern Female
Southern accent
Fast but robotic sounding
Rachel
Calm, professional female
Bella
Soft, gentle female
Josh
Deep, authoritative male
Requires ElevenLabs API key
Clone a voice from an audio sample using ElevenLabs. Record or upload a voice sample (at least 30 seconds recommended).
— or record directly —
This voice is used when Cogs speaks to you on the phone via Twilio. Changes are saved to your profile.
Optional instructions that shape how Cogs responds to you (applies to chat, phone, and SMS)
Changes apply instantly on the main face view
Live Preview
◉
Classic
Enhanced 2D canvas face
⚙
Robot
3D metallic robot head
👤
Human
Realistic 3D avatar
🎭
VRM
VTuber avatar (VRoid)
▭
Visor
Screen-face style
◆
Angular
Geometric box head
●
Rounded
Smooth sphere head
👩
Brunette
Cartoon style (4.7MB)
👨
Male Realistic
Photo-realistic (12MB)
👧
Female Realistic
Photo-realistic (14MB)
👦
Male Casual
Cartoon style (4.6MB)
👩
Female Casual
Cartoon style (4.2MB)
TalkingHead-compatible GLB with ARKit + Oculus viseme blend shapes. Use
Avaturn or
VRoid Studio to create custom avatars.
Robot and Human styles require WebGL. Changes apply on next page load.
Local runs on GPU server (zero API cost), Cloud requires API keys
Running on GPU server via vLLM (4x L40S)
Dialog Service (Chat)
:8093
Vision Service (Face/Emotion)
:8085
Perception (Whisper STT)
:8086
TTS Service (Speech)
:8087
Memory (Indexer/pgvector)
:8011
Context Service (Weather/Location)
:8097
Learner (Content Ingestion)
:8016
Observer (Passive Learning)
:8017
Summarizer (Memory Consolidation)
:8012
Emotional State (CESS)
:8102