droidclaw - ai agent for android

now live — sign up & start controlling your device

turn old phones into
ai agents

give it a goal in plain english. it reads the screen, thinks about what to do, taps and types via adb, and repeats until the job is done.

open dashboard view source

droidclaw

$ bun run src/kernel.ts
enter your goal: open youtube and search for "lofi hip hop"

--- step 1/30 ---
think: i'm on the home screen. launching youtube.
action: launch (842ms)

--- step 2/30 ---
think: youtube is open. tapping search icon.
action: tap (623ms)

--- step 3/30 ---
think: search field focused.
action: type "lofi hip hop" (501ms)

--- step 4/30 ---
action: enter (389ms)

--- step 5/30 ---
think: search results showing. done.
action: done (412ms)

perceive, reason, act, adapt

every step is a loop. dump the accessibility tree, filter interactive elements, send to an llm, execute the action, repeat.

1. perceive

captures the screen via uiautomator dump and parses the accessibility xml into tappable elements with coordinates and state.

2. reason

sends screen state + goal to an llm. the model returns think, plan, action - it explains its reasoning before acting.

3. act

executes the chosen action via adb - tap, type, swipe, launch, press back. 22 actions available.

4. adapt

if screen doesn't change for 3 steps, stuck recovery kicks in. empty accessibility tree falls back to screenshots.

interactive, workflows, or flows

type a goal, chain goals across apps with ai, or run deterministic steps with no llm calls.

just type

run it and describe what you want. the agent figures out the rest.

$ bun run src/kernel.ts
enter your goal: send "running
late, 10 mins" to Mom on whatsapp

ai-powered · json

chain goals across multiple apps. natural language steps, the llm navigates.

{
  "name": "weather to whatsapp",
  "steps": [
    { "app": "com.google...",
      "goal": "search chennai weather" },
    { "goal": "share to Sanju" }
  ]
}

instant · yaml

fixed taps and types. no llm, instant execution. for repeatable tasks.

appId: com.whatsapp
name: Send WhatsApp Message
---
- launchApp
- tap: "Contact Name"
- type: "hello from droidclaw"
- tap: "Send"

workflows

json format, uses ai
handles ui changes and popups
slower (llm calls each step)
best for complex multi-app tasks

flows

yaml format, no ai needed
breaks if ui changes
instant execution
best for simple repeatable tasks

what you can build with this

delegate to on-device ai apps, control phones remotely, turn old devices into always-on agents.

delegate to ai apps on-device

open google's ai mode, ask a question, grab the answer, forward it to whatsapp. or ask chatgpt something and share the response to slack. the agent uses apps on your phone as tools - no api keys for those services needed.

remote control with tailscale

install tailscale on phone + laptop. connect adb over the tailnet. your phone is now a remote agent - control it from anywhere. run workflows from a cron job at 8am every morning.

# from anywhere:
adb connect <phone-tailscale-ip>:5555
bun run src/kernel.ts --workflow morning.json

old phones, always on

that android in a drawer can now send standups to slack, check flight prices, digest telegram channels, forward weather to whatsapp. it runs apps that don't have apis.

automation with ai intelligence

unlike predefined button flows, the agent actually thinks. if a button moves, a popup appears, or the layout changes - it adapts. it reads the screen, understands context, and makes decisions.

things it can do right now

across any app installed on the device.

send whatsapp to saved or unsaved numbers
reply to latest sms
compose emails via gmail
telegram messages to groups
post standups to slack
broadcast to multiple contacts

search google, collect results
ask chatgpt / gemini, grab answer
check weather, stocks, flights
compare prices across apps
translate via google translate
compile multi-source digests

post to instagram, twitter/x
like and comment on posts
check engagement metrics
save youtube to watch later
follow / unfollow accounts
check linkedin notifications

morning briefing across apps
create calendar events
capture notes in google keep
check github pull requests
set alarms and reminders
triage notifications

order food from delivery apps
book an uber ride
play songs on spotify
check commute on maps
log workouts, track expenses
toggle do not disturb

toggle wifi, bluetooth, airplane
adjust brightness, volume
force stop or clear cache
grant/revoke permissions
install/uninstall apps
run any adb shell command

what works and what doesn't

22 actions + 6 multi-step skills. here's the reality.

works well

native android apps with standard ui
multi-app workflows that chain goals
device settings via shell commands
text input, navigation, taps
stuck detection + recovery
vision fallback for empty trees

unreliable

flutter, react native, games
webviews (incomplete tree)
drag & drop, multi-finger
notification interaction
clipboard on android 12+
captchas and bot detection

can't do

banking apps (FLAG_SECURE)
biometrics (fingerprint, face)
bypass encrypted lock screen
access other apps' private data
audio or camera streams
pinch-to-zoom gestures

getting started

install

one command. installs bun and adb if missing, clones the repo, sets up .env.

curl -fsSL https://droidclaw.ai/install.sh | sh

or do it manually:

# install adb
brew install android-platform-tools

# install bun (required — npm/node won't work)
curl -fsSL https://bun.sh/install | bash

# clone and setup
git clone https://github.com/unitedbyai/droidclaw.git
cd droidclaw && bun install
cp .env.example .env

configure an llm provider

edit .env - fastest way to start is groq (free tier):

LLM_PROVIDER=groq
GROQ_API_KEY=gsk_your_key_here

# or run fully local with ollama (no api key)
# ollama pull llama3.2
# LLM_PROVIDER=ollama

provider	cost	vision	notes
groq	free	no	fastest to start
ollama	free (local)	yes*	no api key, runs on your machine
openrouter	per token	yes	200+ models
openai	per token	yes	gpt-4o
bedrock	per token	yes	claude on aws

install the android app

download and install the companion app on your android device.

download apk (v0.4.0)

connect your phone

enable usb debugging in developer options, plug in via usb.

adb devices   # should show your device
cd droidclaw && bun run src/kernel.ts

tune (optional)

key	default	what
MAX_STEPS	30	steps before giving up
STEP_DELAY	2	seconds between actions
STUCK_THRESHOLD	3	steps before stuck recovery
VISION_MODE	fallback	off / fallback / always
MAX_ELEMENTS	40	ui elements sent to llm

10 files in src/

kernel.ts          main loop
actions.ts         22 actions + adb retry
skills.ts          6 multi-step skills
workflow.ts        workflow orchestration
flow.ts            yaml flow runner
llm-providers.ts   5 providers + system prompt
sanitizer.ts       accessibility xml parser
config.ts          env config
constants.ts       keycodes, coordinates
logger.ts          session logging

turn old phones intoai agents