turn old phones into
ai agents
give it a goal in plain english. it reads the screen, thinks about what to do, taps and types via adb, and repeats until the job is done.
droidclaw
$ bun run src/kernel.ts enter your goal: open youtube and search for "lofi hip hop" --- step 1/30 --- think: i'm on the home screen. launching youtube. action: launch (842ms) --- step 2/30 --- think: youtube is open. tapping search icon. action: tap (623ms) --- step 3/30 --- think: search field focused. action: type "lofi hip hop" (501ms) --- step 4/30 --- action: enter (389ms) --- step 5/30 --- think: search results showing. done. action: done (412ms)
perceive, reason, act, adapt
every step is a loop. dump the accessibility tree, filter interactive elements, send to an llm, execute the action, repeat.
1. perceive
captures the screen via uiautomator dump and parses the accessibility xml into tappable elements with coordinates and state.
2. reason
sends screen state + goal to an llm. the model returns think, plan, action - it explains its reasoning before acting.
3. act
executes the chosen action via adb - tap, type, swipe, launch, press back. 22 actions available.
4. adapt
if screen doesn't change for 3 steps, stuck recovery kicks in. empty accessibility tree falls back to screenshots.
interactive, workflows, or flows
type a goal, chain goals across apps with ai, or run deterministic steps with no llm calls.
just type
run it and describe what you want. the agent figures out the rest.
$ bun run src/kernel.ts enter your goal: send "running late, 10 mins" to Mom on whatsapp
ai-powered · json
chain goals across multiple apps. natural language steps, the llm navigates.
{
"name": "weather to whatsapp",
"steps": [
{ "app": "com.google...",
"goal": "search chennai weather" },
{ "goal": "share to Sanju" }
]
}
instant · yaml
fixed taps and types. no llm, instant execution. for repeatable tasks.
appId: com.whatsapp name: Send WhatsApp Message --- - launchApp - tap: "Contact Name" - type: "hello from droidclaw" - tap: "Send"
workflows
- json format, uses ai
- handles ui changes and popups
- slower (llm calls each step)
- best for complex multi-app tasks
flows
- yaml format, no ai needed
- breaks if ui changes
- instant execution
- best for simple repeatable tasks
what you can build with this
delegate to on-device ai apps, control phones remotely, turn old devices into always-on agents.
delegate to ai apps on-device
open google's ai mode, ask a question, grab the answer, forward it to whatsapp. or ask chatgpt something and share the response to slack. the agent uses apps on your phone as tools - no api keys for those services needed.
remote control with tailscale
install tailscale on phone + laptop. connect adb over the tailnet. your phone is now a remote agent - control it from anywhere. run workflows from a cron job at 8am every morning.
# from anywhere: adb connect <phone-tailscale-ip>:5555 bun run src/kernel.ts --workflow morning.json
old phones, always on
that android in a drawer can now send standups to slack, check flight prices, digest telegram channels, forward weather to whatsapp. it runs apps that don't have apis.
automation with ai intelligence
unlike predefined button flows, the agent actually thinks. if a button moves, a popup appears, or the layout changes - it adapts. it reads the screen, understands context, and makes decisions.
things it can do right now
across any app installed on the device.
- send whatsapp to saved or unsaved numbers
- reply to latest sms
- compose emails via gmail
- telegram messages to groups
- post standups to slack
- broadcast to multiple contacts
- search google, collect results
- ask chatgpt / gemini, grab answer
- check weather, stocks, flights
- compare prices across apps
- translate via google translate
- compile multi-source digests
- post to instagram, twitter/x
- like and comment on posts
- check engagement metrics
- save youtube to watch later
- follow / unfollow accounts
- check linkedin notifications
- morning briefing across apps
- create calendar events
- capture notes in google keep
- check github pull requests
- set alarms and reminders
- triage notifications
- order food from delivery apps
- book an uber ride
- play songs on spotify
- check commute on maps
- log workouts, track expenses
- toggle do not disturb
- toggle wifi, bluetooth, airplane
- adjust brightness, volume
- force stop or clear cache
- grant/revoke permissions
- install/uninstall apps
- run any adb shell command
what works and what doesn't
22 actions + 6 multi-step skills. here's the reality.
works well
- native android apps with standard ui
- multi-app workflows that chain goals
- device settings via shell commands
- text input, navigation, taps
- stuck detection + recovery
- vision fallback for empty trees
unreliable
- flutter, react native, games
- webviews (incomplete tree)
- drag & drop, multi-finger
- notification interaction
- clipboard on android 12+
- captchas and bot detection
can't do
- banking apps (FLAG_SECURE)
- biometrics (fingerprint, face)
- bypass encrypted lock screen
- access other apps' private data
- audio or camera streams
- pinch-to-zoom gestures
getting started
1
install
one command. installs bun and adb if missing, clones the repo, sets up .env.
curl -fsSL https://droidclaw.ai/install.sh | sh
or do it manually:
# install adb brew install android-platform-tools # install bun (required — npm/node won't work) curl -fsSL https://bun.sh/install | bash # clone and setup git clone https://github.com/unitedbyai/droidclaw.git cd droidclaw && bun install cp .env.example .env
2
configure an llm provider
edit .env - fastest way to start is groq (free tier):
LLM_PROVIDER=groq GROQ_API_KEY=gsk_your_key_here # or run fully local with ollama (no api key) # ollama pull llama3.2 # LLM_PROVIDER=ollama
| provider | cost | vision | notes |
|---|---|---|---|
| groq | free | no | fastest to start |
| ollama | free (local) | yes* | no api key, runs on your machine |
| openrouter | per token | yes | 200+ models |
| openai | per token | yes | gpt-4o |
| bedrock | per token | yes | claude on aws |
3
install the android app
download and install the companion app on your android device.
4
connect your phone
enable usb debugging in developer options, plug in via usb.
adb devices # should show your device cd droidclaw && bun run src/kernel.ts
5
tune (optional)
| key | default | what |
|---|---|---|
| MAX_STEPS | 30 | steps before giving up |
| STEP_DELAY | 2 | seconds between actions |
| STUCK_THRESHOLD | 3 | steps before stuck recovery |
| VISION_MODE | fallback | off / fallback / always |
| MAX_ELEMENTS | 40 | ui elements sent to llm |
10 files in src/
kernel.ts main loop actions.ts 22 actions + adb retry skills.ts 6 multi-step skills workflow.ts workflow orchestration flow.ts yaml flow runner llm-providers.ts 5 providers + system prompt sanitizer.ts accessibility xml parser config.ts env config constants.ts keycodes, coordinates logger.ts session logging