A beginner-friendly walkthrough for securely accessing your self-hosted LLM from anywhere — over a private Tailscale network fronted by the Aperture AI gateway, never exposed to the public internet.
In Part 1 we did Stages 1 and 2 to get a Qwen3.6-35B-A3B-FP8 Mixture of Experts (MoE) model serving an OpenAI-compatible API on a “SparkStation”, a GB10 NVIDIA DGX Spark-class machine. However, the model is only accessible on the machine itself via localhost:8000. Here in Part 2, we run through Stages 3 to 5 to make the model securely reachable from any other devices you choose, without ever exposing it to the public internet. These instructions should be helpful even if you have a different local AI model being served by something other than a SparkStation.
The problem, and the plan
I am constantly looking for a better way to operate and access sovereign AI solutions. I want self-hosted models running on private infrastructure that are as flexibly accessible as the solutions from OpenAI or Anthropic. But a model answering at localhost:8000 is only usable by the machine it runs on. The obvious way to make it accessible from anywhere is to forward a port through my router. However, this is also dangerous: it puts an unauthenticated AI endpoint on the open internet for anyone to find and abuse.
Instead, my current approach has two layers:
- Tailscale — a private mesh network (“tailnet”) that encrypts direct connections between approved (“allowlisted”) devices, as if they were on the same LAN, no matter where they are physically. Nothing is exposed publicly; only devices that I’ve explicitly added can reach each other. Tailscale offers a very generous free tier, which I’ve been using for several months and have not yet exceeded. Your mileage may vary.
- Aperture (by Tailscale) — an “AI gateway” that sits in front of one or more models on the tailnet. It authenticates every request by the caller’s Tailscale identity, so there are no API keys to distribute, and it logs all usage centrally.
If you follow this guide, your locally hosted models will be reachable only over your private network, and every request through the gateway will be identified and recorded. That’s genuinely private, secure, managed AI.
Concepts in one breath. A tailnet is your private device network. MagicDNS is Tailscale’s feature that lets you address devices by name (e.g.
gateway) instead of IP. A provider in Aperture is an upstream model. A grant is a rule saying who may use which models. You’ll meet each below.
Stage 3 — Put the server on your private network
First, get the GB10 machine (or whatever machine you are using as the “AI server”) onto your tailnet.
3.1 Install and join Tailscale on the server
On the server, install Tailscale and bring it up:
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
tailscale up prints a URL — open it in any browser and sign in (Google, GitHub, Microsoft, or email all work). That authenticates this machine and adds it to your tailnet. The account you sign in with defines your tailnet, so remember which one you use — every device must join the same account.
3.2 Note the server’s Tailscale IP
tailscale ip -4
You’ll get an address in the 100.x.x.x range — Tailscale’s private space. I’ll use 100.92.0.10 as a stand-in below; replace it with your own. This is the address Aperture will use to reach your model.
Because vLLM was launched with
--network hostback in Part 1, it’s already listening on this interface — no change needed. If you run a firewall likeufw, allow the port on the Tailscale interface:sudo ufw allow in on tailscale0 to any port 8000.
Stage 4 — Put Aperture in front
Now we add the gateway. Aperture runs as its own node on your tailnet with its own web dashboard.
4.1 Provision Aperture
Go to aperture.tailscale.com and request access / sign up. During the beta it’s free with any Tailscale account. Once provisioned, Aperture appears as a machine on your tailnet with a hostname, and serves a dashboard at:
http://<your-aperture-hostname>/ui
I’ll use gateway as the stand-in hostname, so my dashboard is http://gateway/ui. Yours will have its own name — you’ll find it in the Aperture sign-up flow and in your Tailscale admin console’s list of Machines.
Two different dashboards — don’t confuse them.
login.tailscale.com/adminis the Tailscale admin console (manages your network: devices, users, access rules).http://<aperture-host>/uiis the Aperture dashboard (manages models, providers, and usage). The model configuration below lives in the Aperture dashboard.
4.2 Add your model as a provider
In the Aperture dashboard, open Configuration, and edit the raw HuJSON configuration (Tailscale’s JSON-with-comments format) to define your self-hosted model as a provider. Look for the "providers": {...} block. You may see it already has a default list of third-party providers (e.g., Anthropic, Codex). I just added the following lines inside the "providers": {...} block, right before the first of those lines:
"sovereign": {
"baseurl": "http://100.92.0.10:8000",
"apikey": "local-no-auth",
"models": ["Qwen/Qwen3.6-35B-A3B-FP8"]
},
Three details that matter, each of which can cost you an afternoon:
-
baseurlhas no/v1. Aperture appends the incoming request path (which already includes/v1/chat/completions) to yourbaseurl. If you add/v1here too, you get a broken/v1/v1/...path. Use just the host and port, remembering to substitute100.92.0.10for your server’s Tailscale IP from Stage 3. -
apikeyis a throwaway. Your vLLM server doesn’t require a key, but Aperture’s dashboard test button refuses to run without one. Any non-empty string works; vLLM ignores it. -
modelsmust match the exact model ID vLLM serves (checkhttp://localhost:8000/v1/modelson the server if unsure).
4.3 Grant yourself access
Aperture is deny-by-default: even as an admin, you can’t call a model until a grant says so. So, in the same JSON config, find the "grants": [...] block (it follows the "providers": {...} block), and make sure it includes a { "models": "**" } capability to access your model via Aperture:
"grants": [
{
"src": ["*"],
"app": {
"tailscale.com/cap/aperture": [
{ "role": "admin" },
{ "models": "**" }
]
}
}
]
This grants everyone on your tailnet ("*") the admin role and access to all models ("**") — fine for a personal setup. To restrict it to just yourself, replace "*" with your Tailscale login name. Save the config (Aperture treats warnings as errors on save, so it’ll tell you if anything’s off).
4.4 Test the route from the dashboard
Open the Models tab. You should see Qwen/Qwen3.6-35B-A3B-FP8 with a Play icon beside it. Click it — a green check means Aperture successfully reached your vLLM server through the tailnet. A red X means it couldn’t (usually a network access rule; see Troubleshooting).
Stage 5 — Connect a client computer
The final piece: reach the model from another device — a laptop, in my case a MacBook. Any OpenAI-compatible tool can use it, but first the client has to be on the tailnet too.
5.1 Install and join Tailscale on the client
- macOS / Windows: install the Tailscale app from tailscale.com/download (or the Mac App Store), launch it, and sign in with the same account you used on the server.
-
Linux:
curl -fsSL https://tailscale.com/install.sh | shthensudo tailscale up, signing in with the same account.
The “same account” part is essential — it’s what puts the client on the same tailnet as the server and the gateway.
5.2 The end-to-end test
From a terminal on the client, first confirm it can see the gateway and resolve its name:
curl http://gateway/v1/models
If that returns JSON listing your model, MagicDNS is resolving and the tailnet path works. Then the moment of truth — a real request, from your laptop, through the gateway, to the model running on the GB10 box:
curl http://gateway/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"Qwen/Qwen3.6-35B-A3B-FP8","messages":[{"role":"user","content":"hello from my laptop"}]}'
No API key in the request — your Tailscale identity is the authentication. When the reply comes back, it has traveled: laptop → tailnet → Aperture → vLLM on the server → back. Check the Aperture dashboard’s usage log and you’ll see that request recorded with your identity and a token count.
This is the whole setup working as required: a very capable private model, reachable from anywhere you and your devices are, authenticated by identity, logged centrally, and never exposed to the public internet.
What’s next
Congratulations! You now have a secure, private gateway to your own model. A natural next step is to point real harnesses and applications — chat front-ends, notebook tools, coding assistants, agents — at it, which will all reduce to providing these same settings:
| Setting | Value | Notes |
|---|---|---|
| Base URL |
http://gateway/v1 |
Remember to replace gateway with your aperture hostname |
| API key | any non-empty placeholder | Not relevant for your locally hosted model |
| Model |
Qwen/Qwen3.6-35B-A3B-FP8 |
Or, the name of whatever model you have chosen |
| Context Window |
64000 |
128000 will also fit within gpu-memory-utilzation of 0.5 |
| Max Tokens |
4096 |
For general chat, Q&A, data analysis; Set to 8192 for deep reasoning/coding |
RAM Allocation Requirements at Different Context Window Sizes:
Context Window Model+vLLM KV Cache Total RAM 64K ~40GB ~9GB ~49GB 128K ~40GB ~17GB ~57GB 262K (model max) ~40GB ~36GB ~76GB
I’ve been exploring different open apps and models, and am planning to write up the capabilities and quirks that I encounter in future posts. But, nothing is stopping you from using and customizing your sovereign AI setup. And, drop a line to share how you are enjoying it, what you are using it for, and suggestions for other sovereign AI projects!
Troubleshooting recap (Part 2)
| Symptom | Cause | Fix |
|---|---|---|
| Play-icon test shows a red X | A tailnet access rule (ACL) blocks the Aperture node from reaching the server’s port | Allow the Aperture node → server:8000 in your Tailscale admin console policy |
/v1/v1/... errors or 404s through the gateway |
/v1 mistakenly included in the provider baseurl |
Use host:port only, no /v1 |
| “No API key configured” / Play icon greyed out |
Self-hosted provider missing an apikey |
Add any non-empty placeholder; vLLM ignores it |
gateway won’t resolve on the client |
MagicDNS off, or client on a different tailnet |
Enable MagicDNS; confirm the client signed in with the same account. Fallback: use the Aperture node’s 100.x.x.x IP |
| 403 / access denied through the gateway | No grant covers your identity |
Add/confirm a grants entry covering your user and the model |
© 2026 Ram C. Singh. All Rights Reserved.