10io — Actual Intelligence | Fractional CAIO / Chief AI Officer

A beginner-friendly walkthrough for securely accessing your self-hosted LLM from anywhere — over a private Tailscale network fronted by the Aperture AI gateway, never exposed to the public internet.

In Part 1 we did Stages 1 and 2 to get a Qwen3.6-35B-A3B-FP8 Mixture of Experts (MoE) model serving an OpenAI-compatible API on a “SparkStation”, a GB10 NVIDIA DGX Spark-class machine. However, the model is only accessible on the machine itself via localhost:8000. Here in Part 2, we run through Stages 3 to 5 to make the model securely reachable from any other devices you choose, without ever exposing it to the public internet. These instructions should be helpful even if you have a different local AI model being served by something other than a SparkStation.

The problem, and the plan

I am constantly looking for a better way to operate and access sovereign AI solutions. I want self-hosted models running on private infrastructure that are as flexibly accessible as the solutions from OpenAI or Anthropic. But a model answering at localhost:8000 is only usable by the machine it runs on. The obvious way to make it accessible from anywhere is to forward a port through my router. However, this is also dangerous: it puts an unauthenticated AI endpoint on the open internet for anyone to find and abuse.

Instead, my current approach has two layers:

Tailscale — a private mesh network (“tailnet”) that encrypts direct connections between approved (“allowlisted”) devices, as if they were on the same LAN, no matter where they are physically. Nothing is exposed publicly; only devices that I’ve explicitly added can reach each other. Tailscale offers a very generous free tier, which I’ve been using for several months and have not yet exceeded. Your mileage may vary.
Aperture (by Tailscale) — an “AI gateway” that sits in front of one or more models on the tailnet. It authenticates every request by the caller’s Tailscale identity, so there are no API keys to distribute, and it logs all usage centrally.

If you follow this guide, your locally hosted models will be reachable only over your private network, and every request through the gateway will be identified and recorded. That’s genuinely private, secure, managed AI.

Concepts in one breath. A tailnet is your private device network. MagicDNS is Tailscale’s feature that lets you address devices by name (e.g. gateway) instead of IP. A provider in Aperture is an upstream model. A grant is a rule saying who may use which models. You’ll meet each below.

Stage 3 — Put the server on your private network

First, get the GB10 machine (or whatever machine you are using as the “AI server”) onto your tailnet.

3.1 Install and join Tailscale on the server

On the server, install Tailscale and bring it up:

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

tailscale up prints a URL — open it in any browser and sign in (Google, GitHub, Microsoft, or email all work). That authenticates this machine and adds it to your tailnet. The account you sign in with defines your tailnet, so remember which one you use — every device must join the same account.

3.2 Note the server’s Tailscale IP

tailscale ip -4

You’ll get an address in the 100.x.x.x range — Tailscale’s private space. I’ll use 100.92.0.10 as a stand-in below; replace it with your own. This is the address Aperture will use to reach your model.

Because vLLM was launched with --network host back in Part 1, it’s already listening on this interface — no change needed. If you run a firewall like ufw, allow the port on the Tailscale interface: sudo ufw allow in on tailscale0 to any port 8000.

Stage 4 — Put Aperture in front

Now we add the gateway. Aperture runs as its own node on your tailnet with its own web dashboard.

4.1 Provision Aperture

Go to aperture.tailscale.com and request access / sign up. During the beta it’s free with any Tailscale account. Once provisioned, Aperture appears as a machine on your tailnet with a hostname, and serves a dashboard at:

http://<your-aperture-hostname>/ui

I’ll use gateway as the stand-in hostname, so my dashboard is http://gateway/ui. Yours will have its own name — you’ll find it in the Aperture sign-up flow and in your Tailscale admin console’s list of Machines.

Two different dashboards — don’t confuse them. login.tailscale.com/admin is the Tailscale admin console (manages your network: devices, users, access rules). http://<aperture-host>/ui is the Aperture dashboard (manages models, providers, and usage). The model configuration below lives in the Aperture dashboard.

4.2 Add your model as a provider

In the Aperture dashboard, open Configuration, and edit the raw HuJSON configuration (Tailscale’s JSON-with-comments format) to define your self-hosted model as a provider. Look for the "providers": {...} block. You may see it already has a default list of third-party providers (e.g., Anthropic, Codex). I just added the following lines inside the "providers": {...} block, right before the first of those lines:

    "sovereign": {
      "baseurl": "http://100.92.0.10:8000",
      "apikey": "local-no-auth",
      "models": ["Qwen/Qwen3.6-35B-A3B-FP8"]
    },

Three details that matter, each of which can cost you an afternoon:

baseurl has no /v1. Aperture appends the incoming request path (which already includes /v1/chat/completions) to your baseurl. If you add /v1 here too, you get a broken /v1/v1/... path. Use just the host and port, remembering to substitute 100.92.0.10 for your server’s Tailscale IP from Stage 3.
apikey is a throwaway. Your vLLM server doesn’t require a key, but Aperture’s dashboard test button refuses to run without one. Any non-empty string works; vLLM ignores it.
models must match the exact model ID vLLM serves (check http://localhost:8000/v1/models on the server if unsure).

4.3 Grant yourself access

Aperture is deny-by-default: even as an admin, you can’t call a model until a grant says so. So, in the same JSON config, find the "grants": [...] block (it follows the "providers": {...} block), and make sure it includes a { "models": "**" } capability to access your model via Aperture:

  "grants": [
    {
      "src": ["*"],
      "app": {
        "tailscale.com/cap/aperture": [
          { "role": "admin" },
          { "models": "**" }
        ]
      }
    }
  ]

This grants everyone on your tailnet ("*") the admin role and access to all models ("**") — fine for a personal setup. To restrict it to just yourself, replace "*" with your Tailscale login name. Save the config (Aperture treats warnings as errors on save, so it’ll tell you if anything’s off).

4.4 Test the route from the dashboard

Open the Models tab. You should see Qwen/Qwen3.6-35B-A3B-FP8 with a Play icon beside it. Click it — a green check means Aperture successfully reached your vLLM server through the tailnet. A red X means it couldn’t (usually a network access rule; see Troubleshooting).

Stage 5 — Connect a client computer

The final piece: reach the model from another device — a laptop, in my case a MacBook. Any OpenAI-compatible tool can use it, but first the client has to be on the tailnet too.

5.1 Install and join Tailscale on the client

macOS / Windows: install the Tailscale app from tailscale.com/download (or the Mac App Store), launch it, and sign in with the same account you used on the server.
Linux: curl -fsSL https://tailscale.com/install.sh | sh then sudo tailscale up, signing in with the same account.

The “same account” part is essential — it’s what puts the client on the same tailnet as the server and the gateway.

5.2 The end-to-end test

From a terminal on the client, first confirm it can see the gateway and resolve its name:

curl http://gateway/v1/models

If that returns JSON listing your model, MagicDNS is resolving and the tailnet path works. Then the moment of truth — a real request, from your laptop, through the gateway, to the model running on the GB10 box:

curl http://gateway/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen/Qwen3.6-35B-A3B-FP8","messages":[{"role":"user","content":"hello from my laptop"}]}'

No API key in the request — your Tailscale identity is the authentication. When the reply comes back, it has traveled: laptop → tailnet → Aperture → vLLM on the server → back. Check the Aperture dashboard’s usage log and you’ll see that request recorded with your identity and a token count.

This is the whole setup working as required: a very capable private model, reachable from anywhere you and your devices are, authenticated by identity, logged centrally, and never exposed to the public internet.

What’s next

Congratulations! You now have a secure, private gateway to your own model. A natural next step is to point real harnesses and applications — chat front-ends, notebook tools, coding assistants, agents — at it, which will all reduce to providing these same settings:

Setting	Value	Notes
Base URL	`http://gateway/v1`	Remember to replace `gateway` with your aperture hostname
API key	any non-empty placeholder	Not relevant for your locally hosted model
Model	`Qwen/Qwen3.6-35B-A3B-FP8`	Or, the name of whatever model you have chosen
Context Window	`64000`	`128000` will also fit within `gpu-memory-utilzation` of `0.5`
Max Tokens	`4096`	For general chat, Q&A, data analysis; Set to `8192` for deep reasoning/coding

RAM Allocation Requirements at Different Context Window Sizes:

Context Window Model+vLLM KV Cache Total RAM

64K ~40GB ~9GB ~49GB

128K ~40GB ~17GB ~57GB

262K (model max) ~40GB ~36GB ~76GB

Context Window	Model+vLLM	KV Cache	Total RAM
64K	~40GB	~9GB	~49GB
128K	~40GB	~17GB	~57GB
262K (model max)	~40GB	~36GB	~76GB

I’ve been exploring different open apps and models, and am planning to write up the capabilities and quirks that I encounter in future posts. But, nothing is stopping you from using and customizing your sovereign AI setup. And, drop a line to share how you are enjoying it, what you are using it for, and suggestions for other sovereign AI projects!

Troubleshooting recap (Part 2)

Symptom	Cause	Fix
Play-icon test shows a red X	A tailnet access rule (ACL) blocks the Aperture node from reaching the server’s port	Allow the Aperture node → server:8000 in your Tailscale admin console policy
`/v1/v1/...` errors or 404s through the gateway	`/v1` mistakenly included in the provider `baseurl`	Use host:port only, no `/v1`
“No API key configured” / Play icon greyed out	Self-hosted provider missing an `apikey`	Add any non-empty placeholder; vLLM ignores it
`gateway` won’t resolve on the client	MagicDNS off, or client on a different tailnet	Enable MagicDNS; confirm the client signed in with the same account. Fallback: use the Aperture node’s `100.x.x.x` IP
403 / access denied through the gateway	No grant covers your identity	Add/confirm a `grants` entry covering your user and the model