Open-source virtual teammates that take voice and video calls β and let you interrupt, redirect, or pause them mid-task without restarting.
Hop on a call with one. Send a follow-up text. Drop them a calendar invite. They remember who you are next time, what you talked about last week, and what they promised to do about it.
Most agents stop the moment you talk. They make you wait for a tool call to finish, then re-explain when you change your mind. Unity's teammates stay listening through everything β chat, voice, phone, video, screen-share β and treat your interjections, corrections, and questions as first-class inputs rather than interruptions to recover from. Whether the assistant is researching flights, drafting an email, or sitting on a live call with a vendor, you can ask "how's it going?", say "actually do X instead", or pause for ten minutes β without losing context.
It's built around long-lived state, not one-shot conversations. Contacts, projects, files, knowledge, and follow-ups persist as queryable structure β so a teammate remembers who Sarah is, what the Henderson project is about, and what they committed to on your behalf last Wednesday, regardless of which channel you raised it on.
Start here: console.unify.ai β try a teammate in 60 seconds β’ Overview β’ Quickstart β’ ARCHITECTURE.md
You βΈ "Find me flights to Tokyo for next month."
Unity βΈ (starts searching)
You βΈ "Actually, also check trains to Osaka."
Unity βΈ (adjusts the in-flight search β doesn't restart)
You βΈ "Pause that, something urgent."
Unity βΈ (freezes exactly where it is)
... five minutes later ...
You βΈ "OK, resume. How's it going?"
Unity βΈ (picks up where it left off, gives you a status update)
Unity βΈ (on a live phone call with a vendor)
You βΈ (in a side chat) "Don't agree to anything over $5k."
Unity βΈ (the constraint reaches the call mid-conversation)
Unity βΈ Three tasks running at once.
[0] research_flights βββββββββββββ in progress
[1] draft_summary βββββββββββββ in progress
[2] find_restaurants ββββββββββββ starting
Each one independently inspectable, steerable, and pausable.
| ποΈ Takes calls like a person | Live voice, phone, and video calls β with screen-share and webcam frames streamed to the assistant in real time. Not a tool that initiates a call; a participant in the conversation. |
| β Interruptible mid-task | Every operation can be paused, resumed, redirected, or queried while it's running. Including operations nested inside other operations, all the way down. |
| π§ Plans in code, not tool-by-tool | Multi-step work becomes one coherent program with variables, loops, and control flow β instead of a noisy chain of one-tool-at-a-time decisions. |
| π One identity across every channel | Chat, SMS, email, phone, voice, video β all feed the same persistent memory. The assistant remembers who Sarah is whether she texted, called, or mailed you. |
| π Structured memory, not transcript soup | Contacts, knowledge, tasks, files, and procedures live in typed, queryable tables β distilled from your conversations every fifty messages. |
| βοΈ Learns reusable functions, not just markdown | After a successful trajectory, the assistant can save executable Python (with metadata and a venv) β so the next session can compose it into a plan, not re-derive it. |
| π Concurrent work, independently steerable | Multiple actions can run at once. Pause one, redirect another, ask a third for a status update β without affecting the rest. |
| β° Schedules and triggers in plain English | "Every Monday at 9, summarize my unread emails" or "Ping me whenever Alice emails about invoices." Recurring jobs and event triggers are described in natural language, executed by the same agent loop β and can graduate into stored functions after enough successful runs. |
| π Local-first, fully open | Runtime, persistence backend, LLM client, and Python SDK are all open-source and run locally with one Docker command. Hosted backend optional. |
There are two paths, depending on whether you want to meet a teammate or run the whole stack yourself.
The lowest-friction path is the hosted product at console.unify.ai. Sign in with Google, get matched with a teammate, and start chatting in about a minute. No install, no Docker, no API keys to manage. Voice, video, telephony, and integrations are all turn-key.
Run the whole stack on your own machine. Runtime, persistence backend, LLM client, and Python SDK are all open-source β see Self-host below.
No signup required. The local installer auto-generates a synthetic API key for the bundled Orchestra and wires everything together. The only key you bring is one LLM provider key (OpenAI or Anthropic).
By default, Unity's open-core install is fully local: the runtime, the LLM client, and the persistence backend (Orchestra, via Docker) all run on your machine. The hosted product at console.unify.ai is optional β Unity does not depend on it for any local feature.
Prerequisites:
- Python 3.12+ (the installer will fetch it with
uvif needed) - Docker (runs the local Orchestra backend)
- PortAudio for audio support
- macOS:
brew install portaudio - Ubuntu/Debian:
sudo apt-get install portaudio19-dev python3-dev
- macOS:
- One LLM provider key β OpenAI or Anthropic are the simplest paths
Install:
curl -fsSL https://raw.githubusercontent.com/unifyai/unity/main/scripts/install.sh | bashThe installer clones unity, unify, unillm, and orchestra as siblings under ~/.unity/, installs dependencies, creates a unity CLI shim in ~/.local/bin/, boots a local Orchestra in Docker, generates a local API key for the bundled Orchestra, and wires ORCHESTRA_URL and that auto-generated key into ~/.unity/unity/.env. No Unify account or external signup is required.
Add one model provider key to ~/.unity/unity/.env:
OPENAI_API_KEY=sk-...
# or
ANTHROPIC_API_KEY=...Run the sandbox:
unity --project_name Sandbox --overwriteAt the configuration prompt:
| Option | What it gives you |
|---|---|
1 |
Top-level orchestration only β useful for isolating the conversation layer |
2 |
The full runtime: orchestration + planning + simulated managers |
3 |
Option 2 plus desktop/browser control through agent-service |
If you're evaluating Unity as a runtime, start with option 2.
> msg Hey, can you help me organize my upcoming week?
> sms I need to reschedule my meeting with Sarah to Thursday
> email Project Update | Here are the Q3 numbers you asked for...
Other unity subcommands: unity setup, unity status, unity stop, unity restart, unity help.
Skip the local Orchestra (point at your own deployment)
curl -fsSL https://raw.githubusercontent.com/unifyai/unity/main/scripts/install.sh | bash -s -- --skip-setupThat leaves the code installed but doesn't spin up Orchestra. You'll need to point Unity at your own Orchestra deployment (or another team's shared one) via ORCHESTRA_URL and a matching API key in ~/.unity/unity/.env.
Manual install (no installer script)
git clone https://github.com/unifyai/unity.git ~/.unity/unity
git clone https://github.com/unifyai/unify.git ~/.unity/unify
git clone https://github.com/unifyai/unillm.git ~/.unity/unillm
git clone https://github.com/unifyai/orchestra.git ~/.unity/orchestra
cd ~/.unity/unity
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
cd ~/.unity/orchestra
poetry install
ORCHESTRA_INACTIVITY_TIMEOUT_SECONDS=0 scripts/local.sh start
# Copy the ORCHESTRA_URL and UNIFY_KEY it prints into ~/.unity/unity/.envThe installer copies .env.example to .env (intentionally minimal). For voice mode, live calls, hosted comms, LiveKit, Tavily, or visual caching, see .env.advanced.example and sandboxes/conversation_manager/README.md.
Unity follows the interaction-model / background-model split recently articulated by Thinking Machines β implemented at the harness level, against any LLM you already use.
A persistent interaction loop (the ConversationManager) stays present with the user across every medium. When work needs deeper reasoning than the conversation can produce instantly, it dispatches a background reasoner (the Actor), which writes Python plans over a back office of typed state managers. Crucially, every operation in the system returns a live, steerable handle β and those handles nest. A correction the user makes in chat propagates down through the dispatched action, into whatever manager call is currently running.
flowchart TB
classDef interaction fill:#fce7f3,stroke:#be185d,stroke-width:2px,color:#1f2937
classDef actor fill:#bbf7d0,stroke:#15803d,stroke-width:2px,color:#1f2937
classDef neutral fill:#f9fafb,stroke:#9ca3af,stroke-width:1px,color:#374151
classDef accent fill:#1f2937,stroke:#000,stroke-width:1px,color:#fef3c7
User(["User"]):::neutral
Mediums["π¬ chat Β· π voice / phone Β· π₯ video / screen-share Β· βοΈ email Β· SMS"]:::neutral
Broker["β‘ Event Broker"]:::accent
CM["<b>ConversationManager</b> Β· interaction loop (always present)<br/>per-handle steering tools: pause Β· resume Β· interject Β· stop Β· ask"]:::interaction
Actor["<b>Actor</b> Β· background reasoner<br/>writes Python that composes primitives.*"]:::actor
BackOffice["<b>The Back Office</b> Β· typed state managers, English-language APIs<br/>Contacts Β· Knowledge Β· Tasks Β· Transcripts Β· Files Β· Images Β· Web Β· Secrets Β· βοΈ Functions Β· π Guidance"]:::neutral
User ==> Mediums ==> Broker ==> CM
CM ==>|"act(...)"| Actor
Actor ==>|"primitives.*"| BackOffice
BackOffice -.->|"SteerableToolHandle"| Actor
Actor -.->|"SteerableToolHandle + notifications"| CM
CM -.->|"streamed responses"| User
Solid arrows are dispatch flow. Dotted arrows are the steering bus β every level returns the same SteerableToolHandle type, so steering signals propagate down through the call stack while results and notifications propagate up.
This is the demo no other framework can run. The user's mid-flight redirect doesn't abort the run, doesn't append a second prompt, and doesn't wait for the next tool boundary β it propagates through the live nested call stack as a typed signal.
sequenceDiagram
autonumber
actor User
participant CM as ConversationManager
participant Ax as Actor
participant TM as TranscriptManager
User->>CM: "find when Sarah last mentioned Berlin"
CM->>Ax: act(prompt)
activate Ax
Ax-->>CM: handle_A (SteerableToolHandle)
Note over CM: handle_A stored in<br/>in_flight_actions
Ax->>TM: transcripts.ask(...)
activate TM
TM-->>Ax: handle_B (nested SteerableToolHandle)
User->>CM: "actually include emails too"
Note over CM: slow brain wakes,<br/>picks the steering tool<br/>for handle_A
CM->>Ax: handle_A.interject("...also emails")
Ax->>TM: handle_B.interject("...also emails")
TM-->>Ax: refined results
deactivate TM
Ax-->>CM: notification (intermediate progress)
CM-->>User: "scanning emails too..."
Ax-->>CM: handle_A.result
deactivate Ax
CM-->>User: final answer
The clearest way to see what's distinctive about Unity is to draw the same diagram for adjacent projects, using the same visual language. Pink means persistent supervising loop (only Unity has one). Click to expand.
OpenClaw β channel-first dispatcher + single Pi agent loop
flowchart TB
classDef agent fill:#bbf7d0,stroke:#15803d,stroke-width:2px,color:#1f2937
classDef neutral fill:#f9fafb,stroke:#9ca3af,stroke-width:1px,color:#374151
classDef dispatch fill:#fed7aa,stroke:#c2410c,stroke-width:2px,color:#1f2937
User(["User"]):::neutral
Channels["π¬ Telegram Β· Discord Β· Slack Β· SMS Β· Nodes (devices)"]:::neutral
Gateway["<b>Gateway daemon</b> Β· dispatcher<br/>per-session lane (1 active run); steer = abort + redeliver"]:::dispatch
PiAgent["<b>Pi embedded agent</b> Β· single tool-calling loop<br/>no supervising loop runs in parallel"]:::agent
Tools["<b>Tools</b> Β· core + plugin + MCP bridge<br/>core (web Β· exec Β· sessions_spawn) Β· π voice-call plugin (discrete actions: initiate Β· speak Β· end) Β· mcporter β MCP servers"]:::neutral
State["<b>State</b> Β· local-first artefacts<br/>JSONL sessions Β· workspace files (π SKILL.md Β· SOUL.md Β· AGENTS.md) Β· memory plugin (one slot at a time)"]:::neutral
User ==> Channels ==> Gateway
Gateway ==>|"start / abort run"| PiAgent
PiAgent ==> Tools
PiAgent <==> State
OpenClaw is a local-first control plane with a wide channel matrix and a plugin marketplace. The Gateway dispatches runs but doesn't supervise them; voice is a plugin tool the agent invokes through discrete actions; steering is implemented as abort-and-redeliver. OpenClaw's VISION.md explicitly takes "no agent-hierarchy frameworks (manager-of-managers)" as a non-goal β a deliberate, principled bet in the opposite direction from Unity. If you want a personal-assistant product with broad channel coverage, OpenClaw is excellent. If you want a runtime built around mid-task steering and structured long-lived state, Unity is shaped differently.
Hermes Agent β many surfaces, one monolithic loop
flowchart TB
classDef agent fill:#bbf7d0,stroke:#15803d,stroke-width:2px,color:#1f2937
classDef neutral fill:#f9fafb,stroke:#9ca3af,stroke-width:1px,color:#374151
classDef trigger fill:#fed7aa,stroke:#c2410c,stroke-width:2px,color:#1f2937
User(["User"]):::neutral
Cron["β° cron + webhooks (automation triggers)"]:::trigger
Surfaces["π¬ CLI Β· TUI Β· Gateway (Telegram Β· Discord Β· Slack Β· SMS) Β· ACP (IDE)"]:::neutral
AIAgent["<b>AIAgent</b> Β· single ~12k-LOC sync tool-calling loop<br/>steer() = inject text into next tool result; interrupt() = thread-scoped abort flag"]:::agent
Tools["<b>Tools</b><br/>native tools Β· execute_code (ephemeral Python against fixed RPC stubs) Β· TTS / voice_mode / SMS (no live phone call) Β· delegate_tool Β· MCP servers"]:::neutral
State["<b>State</b><br/>SQLite sessions + FTS5 Β· MEMORY.md / USER.md workspace files Β· π SKILL.md library Β· memory provider plugin (mem0 Β· honcho Β· ...)"]:::neutral
User ==> Surfaces
Cron ==> Surfaces
Surfaces ==> AIAgent
AIAgent ==> Tools
AIAgent <==> State
Hermes pairs a single ~12k-LOC AIAgent loop with four surfaces (CLI, TUI, gateway, ACP), a deep markdown skills library, SQLite+FTS5 transcripts, and best-in-class cron / webhook automation. Steering is implemented as text injection into the next tool result; interrupt is a thread-scoped flag. Live telephony isn't in the repo β SMS is, voice is local-only. If you want a polished personal-agent product with a wide messaging surface, broad model support, and mature automation triggers, Hermes is excellent. Unity is making a different bet on what the orchestration layer should look like.
A small bit of history. This architecture has been running in Unity since 2025 β well ahead of the wider conversation about it. For the record:
SteerableToolHandle(the universal steering protocol) β first commit September 23, 2025. That predates OpenClaw's first commit (Nov 24, 2025), Hermes Agent'sinterrupt()(Feb 3, 2026) andsteer()(Apr 18, 2026).ConversationManager+ dual-brain LiveKit voice β first commit November 12, 2025. That predates OpenClaw'svoice-callplugin (Jan 11, 2026) by two months.- The two-tier interaction-loop / background-reasoner pattern as a whole β operational since November 2025. The Thinking Machines paper that articulated the same architecture was published May 11, 2026, six months later.
We're not claiming foresight; the convergence is just interesting if you find architectural archaeology fun. Repo dates verifiable in
git log.
Every public manager method returns one. The same ask, interject, pause, resume, stop surface, regardless of whether you're talking to the top-level orchestrator or a deeply nested knowledge query.
handle = await actor.act("Research flights to Tokyo and draft an itinerary")
# Twenty seconds later, while it's still working:
await handle.interject("Also check train options from Tokyo to Osaka")
# Or if something urgent comes up:
await handle.pause()
# ... deal with the urgent thing ...
await handle.resume()When the Actor calls primitives.contacts.ask(...), the ContactManager starts its own tool loop and returns its own handle β nested inside the Actor's handle, which is nested inside the ConversationManager's. Steering at any level propagates.
Most agents emit one JSON tool call at a time and let the LLM stitch results together across turns. Unity's Actor writes a single Python program per turn over typed primitives.*:
contacts = await primitives.contacts.ask(
"Who was involved in the Henderson project?"
)
for contact in contacts:
history = await primitives.knowledge.ask(
f"What was {contact} last working on?"
)
await primitives.contacts.update(
f"Send {contact} a catch-up email referencing {history}"
)This runs in a sandboxed execution session. Variables, loops, real control flow. A contact lookup β knowledge retrieval β outbound communication becomes one coherent plan rather than three separate tool-selection turns β and the LLM can express intermediate computation directly instead of round-tripping through tool messages.
Live calls run as two coordinated brains:
- Slow brain β the
ConversationManager. Sees the full picture: all conversations, in-flight actions, structured memory. Makes deliberate decisions. Runs in the main process. - Fast brain β a real-time voice agent on LiveKit, running as a separate subprocess. Sub-second latency. Handles turn-taking and direct conversation autonomously.
They communicate over IPC. When the slow brain wants to guide the conversation, it sends one of:
- SPEAK β "say exactly this" (bypasses the fast brain's LLM)
- NOTIFY β "here's some context, decide what to do with it"
- BLOCK β nothing; the fast brain keeps going on its own
Screen-share frames and webcam frames stream to both brains simultaneously, so the fast brain can answer "can you see my screen?" without round-tripping, while the slow brain incorporates visual context into longer-running plans.
Unity maintains two persistent libraries that the Actor draws from on every session:
FunctionManagerβ executable Python (with metadata and a venv) that the Actor composes into plans.GuidanceManagerβ procedural how-to prose: SOPs, software walkthroughs, multi-step strategies.
After a successful trajectory, a proactive reviewer loop (store_skills) can extract both β code worth keeping, and the procedural narrative for using it. The next session consults both before reaching for raw tools, by design.
Recurring and triggered work isn't configured with cron expressions or webhook YAML β it's described to the agent in natural language and stored as a Task with schedule and repeat (for cadences) or trigger (for event matches). When the time arrives or the trigger fires, a contained Actor run wakes up, reads the task's description, and figures out how to do it.
That same task can graduate over time. After enough successful description-driven runs, the storage-review loop can persist the trajectory as a stored function β at which point the recurring task runs in a hidden, headless lane against that function rather than re-planning from scratch each time. So "summarize my unread emails every Monday at 9" starts out as a paragraph the agent interprets, and gradually becomes an entrypoint it just calls.
Every fifty messages, the MemoryManager runs a background extraction pass over the new transcript window. It distills:
- Contact profiles β who people are, their roles, relationships
- Per-contact summaries β what you've been discussing, sentiment, themes
- Response policies β how each person prefers to be communicated with
- Domain knowledge β project details, preferences, long-term facts
- Tasks β things you committed to, deadlines, follow-ups
These end up in typed, queryable tables β not freeform transcript summaries.
ββ In-Flight Actions βββββββββββββββββββββββββββββββββ
β β
β [0] research_flights βββββββββββββ In progress β
β β ask, interject, stop, pause β
β β
β [1] draft_summary βββββββββββββ In progress β
β β ask, interject, stop, pause β
β β
β [2] find_restaurants ββββββββββββ Starting β
β β ask, interject, stop, pause β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Each action gets its own dynamically-generated steering tools attached to the slow brain's tool surface. You can inspect, interject into, pause, resume, or stop one action without affecting the others.
For the full architectural breakdown β async tool loop internals, event bus, primitive registry, hosted deployment SPI β see ARCHITECTURE.md. At a glance:
ConversationManager (interaction loop, event-driven scheduling)
β
β Slow Brain βββ IPC βββΊ Fast Brain (real-time voice + video, LiveKit)
β
βΌ
CodeActActor (generates Python plans, calls primitives.* APIs)
β
βΌ
State Managers (each runs its own async LLM tool loop)
β
βββ ContactManager β people and relationships
βββ KnowledgeManager β domain facts, structured knowledge
βββ TaskScheduler β durable tasks, schedules, triggers, execution with live handles
βββ TranscriptManager β conversation history and search
βββ GuidanceManager β procedures, SOPs, how-to knowledge
βββ FileManager β file parsing and registry
βββ ImageManager β image storage, vision queries
βββ FunctionManager β user-defined functions, primitives registry
βββ WebSearcher β web research orchestration
βββ SecretManager β encrypted secret storage
βββ BlacklistManager β blocked contact details
βββ DataManager β low-level data operations
β
βββ EventBus β typed pub/sub backbone (Pydantic events)
βββ MemoryManager β offline consolidation every 50 messages
- A user message arrives on any medium. The slow brain renders a full state snapshot and makes a single-shot tool decision.
- It starts an action via
actor.act(...)β gets back aSteerableToolHandle, registered inin_flight_actions. - The Actor generates a Python plan calling typed primitives. Each primitive dispatches to a manager running its own LLM tool loop, returning its own steerable handle.
- Meanwhile, the slow brain can start more work, steer existing work, or guide the fast brain during voice/video calls.
- The MemoryManager observes message events and periodically distills conversations into structured knowledge.
- The EventBus carries typed events with hierarchy labels aligned to tool-loop lineage, making everything observable.
Unity is one of four MIT-licensed repos that make up the runtime. The installer wires them together for the local install; you can also use any of them independently.
| Repo | Role |
|---|---|
| unity (this) | The agent runtime β managers, tool loops, CodeAct, voice, orchestration |
| orchestra | Persistence backend β FastAPI + Postgres + pgvector. Installer spins it up locally in Docker |
| unify | Python SDK β the client Unity uses to talk to Orchestra |
| unillm | LLM access layer β OpenAI, Anthropic, or any compatible endpoint |
Tests exercise the real system (steerable handles, CodeAct, manager composition, nested tool loops) against simulated backends with cached LLM responses:
uv sync --all-groups
source .venv/bin/activate
tests/parallel_run.sh tests/ # everything
tests/parallel_run.sh tests/actor/ # one module
tests/parallel_run.sh tests/contact_manager/ # anotherSee tests/README.md for the full philosophy β responses are cached, not mocked. Delete the cache and you're re-evaluating against live models.
| File | What's there |
|---|---|
unity/common/async_tool_loop.py |
SteerableToolHandle β the protocol everything returns |
unity/common/_async_tool/loop.py |
The async tool loop engine β nesting, steering, context propagation |
unity/actor/code_act_actor.py |
CodeAct β plan generation, sandbox, primitives |
unity/conversation_manager/conversation_manager.py |
Dual-brain orchestration, debouncing, in-flight actions |
unity/conversation_manager/domains/brain_action_tools.py |
How the brain starts, steers, and tracks concurrent work |
unity/conversation_manager/domains/call_manager.py |
LiveKit subprocess + voice/video event wiring |
unity/function_manager/primitives/registry.py |
How primitives are assembled into the typed API surface |
unity/events/event_bus.py |
Typed event backbone |
unity/memory_manager/memory_manager.py |
Offline consolidation pipeline |
unity/
βββ unity/
β βββ actor/ # CodeActActor
β βββ conversation_manager/ # Dual-brain orchestration
β β βββ domains/ # Brain tools, action tracking, rendering
β βββ common/
β β βββ async_tool_loop.py # SteerableToolHandle
β β βββ _async_tool/ # Tool loop internals
β βββ contact_manager/
β βββ knowledge_manager/
β βββ task_scheduler/
β βββ transcript_manager/
β βββ guidance_manager/
β βββ memory_manager/
β βββ function_manager/
β βββ file_manager/
β βββ image_manager/
β βββ web_searcher/
β βββ secret_manager/
β βββ events/
β βββ manager_registry.py
βββ sandboxes/ # Interactive playgrounds
β βββ conversation_manager/ # Full ConversationManager sandbox (start here)
βββ tests/
βββ agent-service/ # Node.js desktop/browser automation
βββ deploy/ # Dockerfile, Cloud Build, virtual desktop
MIT β see LICENSE.
Built by the team at Unify.
