Watch the full tutorial on YouTube🔝
Ever imagined your AI assistant navigating Safari on your iPhone, opening Notes, ordering food or scrolling through an app, all without lifting a finger? Thanks to Claude 4 and a powerful new open-source framework called C/ua, that futuristic scenario is quickly becoming a reality.
In this post, we’ll dive into how you can build your first computer-use and iPhone-use agent using Anthropic’s Claude 4. We’ll walk through what’s already possible, what’s still a bit experimental, and where this is all heading in the fast-evolving landscape of AI agent-based automation.
What is C/ua?
C/ua is an emerging open-source framework designed to help you build agents that can directly interact with your computer and even your phone. Instead of limiting agents to just answering questions or managing text-based workflows, C/ua lets them operate applications, click buttons, type in fields and navigate apps just like a human would.
It’s built to support a wide range of language models, including OpenAI, Claude by Anthropic, and a growing list of open-source LLMs. For this demo, I used Claude 4 because the setup was simpler and more reliable.
Standout Features:
Native support for Claude API
Experimental iPhone-use automation (currently in beta)
MCP support
Suite of tools to build and run AI Agents on Apple Silicon
Why I Decided to Try It
Over the past few months, I’ve experimented with a variety of agent frameworks like Agent Development Kit andAgno, most of which focus on search, summarization or Q&A-style use cases. While those are useful, they don’t really simulate how a real user interacts with software. C/ua is the interesting tool I found that lets agents interact with a computer (and now mobile apps) just like we do.
When I discovered its experimental support for iPhone interaction, actually launching apps, tapping through screens, and entering text. I had to try it out for myself.
What I Built
Basic Desktop Computer-Use Agent. Using Claude 4 and C/ua, I built a desktop agent capable of:
Navigating websites via a browser
Opening applications and performing clicks
Typing into text fields and responding to basic prompts
Setting it up was pretty straightforward. I did hit some rate limits with Claude’s API, which slowed things down a bit, but once up and running, the agent behavior felt very natural.
Let’s look at basic setup!
Follow these four simple steps to install the necessary tools and start building with the CUA Python SDK.
Step 1: Install Lume CLI
Lume CLI lets you manage high-performance macOS/Linux VMs with near-native speed on Apple Silicon.
Run the following command to install it:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh)"
Step 2: Pull the macOS CUA Image
The CUA macOS image comes preloaded with default Mac apps and the Computer Server, making it easy to automate tasks inside the VM. This installation might take about 30GB of space on your Mac, so please be patient during installation.
lume pull macos-sequoia-cua:latest
Step 3: Install the Python SDK
Install the full CUA SDK to interact with the VM and build your automation or agents.
pip install "cua-computer[all]" "cua-agent[all]"
Step 4: Kickstart your Agent
Once everything is installed, you're ready to use the CUA SDK in your Python code. Launch the VM, connect to the server, and start building agents!
import asyncio
import os
from dotenv import load_dotenv # Make sure python-dotenv is installed
from computer import Computer
from agent import ComputerAgent, LLMProvider, LLM, AgentLoop
# Load API key from .env
load_dotenv()
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
async def run_multi_task_workflow():
async with Computer() as macos_computer:
agent = ComputerAgent(
computer=macos_computer,
loop=AgentLoop.ANTHROPIC,
model=LLM(provider=LLMProvider.ANTHROPIC, name="claude-sonnet-4-20250514")
)
tasks = [
"Open Safari and search for studio1hq.com",
"Go to DevRel as a Service page",
"Try to book a call",
"Close Safari"
]
for i, task in enumerate(tasks):
print(f"\nTask {i+1}/{len(tasks)}: {task}")
async for result in agent.run(task):
# Print just the action description for brevity
if result.get("text"):
print(f" → {result.get('text')}")
print(f"✅ Task {i+1} completed")
if __name__ == "__main__":
asyncio.run(run_multi_task_workflow())
Wohoo🎉,our 1st Computer-use agent is ready!
I'm using Claude 4 from Anthropic for this Agent demo, so make sure to get API keys from Anthropic or your preferred models to use with C/ua.
Experimental iPhone-Use Agent
C/ua recently introduced early support for controlling iPhones, and I gave it a shot. While the feature is still under development and a bit unstable, it was able to:
Attempt app launches on iPhone
Mimic basic touch interactions
Showcase the potential for voice-less, hands-free mobile automation
Although I encountered some errors (expected in a beta feature), the direction is clear: we’re moving toward agents that can operate across both desktop and mobile environments.
Imagine the possibilities:
Booking an Uber or ordering food through voice-initiated agents
Opening and editing notes, emails or reminders without touching your phone
Automating repetitive mobile tasks while you focus on more important work
What I Learned
Claude 4 is very capable, but rate limits are real, expect to optimize or batch your requests.
C/ua’s mobile support (app-use experimental) is promising but not production-ready. Still, it’s impressive how far it’s come.
Start simple: build basic desktop agents before jumping into mobile automation to better understand the framework and avoid early frustration.
What’s Next
I plan to explore:
Integrating DeepSeek or OpenAI’s CUA models once setup hurdles are resolved
Testing MCP
Building long-running, persistent agents that can handle sequences of actions across devices or Virtual environments
The C/ua is evolving rapidly. Its roadmap highlights upcoming support for new devices beyond macOS and Linux, including experimental iPhone control — as well as more robust error handling through sandboxed VM environments and richer integration with popular LLMs and orchestration tools. It leverages lightweight Docker containers with Apple’s Virtualization Framework to deliver near-native performance, and future releases promise smoother multi-computer workflows andenhanced architecture.
Curious how this works in practice? I recorded a full walkthrough video where I:
Compare Claude 4
Set up 2 Computer Use Agent from scratch
Test app-use with real app interactions on Mac
Click here to watch the video
Final Thoughts
This was my first real project using Claude’s APIs and it left me optimistic. We’re no longer talking about AI as just a chatbot. With tools like C/ua, we’re looking at a near-future where agents can truly assist with real workflows, right inside your apps and devices.
Sure, there are limitations. Rate limits, software updates, and model constraints all exist. But the ability to simulate user behavior on both computers and phones is a huge leap forward.
If you’ve built something with Claude’s computer-use capabilities or are exploring agent automation on mobile or desktop, I’d love to hear your story. Let’s share ideas.
Thankyou for reading! If you found this article useful, share it with your peers and community.
If You ❤️ My Content! Connect Me on Twitter
Check SaaS Tools I Use 👉🏼Access here!
I am open to collaborating on Blog Articles and Guest Posts🫱🏼🫲🏼 📅Contact Here
Top comments (0)