A chatbot's worst enemy is page refresh
zknill.io> But we’ve hit the ceiling for SSE. That terrible Claude UI refresh gif is state of the art for SSE. And it sucks.
This is nothing to do with SSE. It's trivial to persist state over disconnects and refresh with SSE. You can do all the same pub sub tricks.
None of theses companies are even using brotli on their SSE connection for 40-400x compression.
It's just bad engineering and it's going to be much worse with web sockets. Because, you have to rebuild http from scratch, compression is nowhere near as good, bidirectional nukes your mobile battery because of the duplex antenna, etc, etc.
Just to add. The main value of websockets was faster up events pre http2. But, now with multiplexing in http2 that's no longer the case.
So the only thing you get from websockets is bidirectional events (at the cost of all the production challenges websockets bring). In practice most problems don't need that feature.
Thanks for that. I know very little about frontend and this definitely will help me make something better.
No mention of ChatGPT? Anyone else have this problem:
Go to ChatGPT.com while logged in, start typing right away, 8 words into typing it clears the text in the form. Why?
Claude also has odd UI/UX bugs in what is almost literally a single page web application.
It's probably due to server side rendering and rehydration. The rehydration use server side component state to override DOM state.
I switched away from ChatGPT mainly due to that. Gemini is much faster to type into.
It's a rerendering bug. Insanely annoying.
useEffect if I had to guess
Yep. We had to do a surprising amount of work to solve this in our product: https://www.kitewing.ai/blog/stateless-agents-stateful-produ...
Very weird that the foundational LLM companies' own chat pages don't do this.
>surprising amount of work
Dunno, in my Go+HTMX project, it was pretty trivial to add SSE streaming. When you open a new chat tab, we load existing data from the DB and then HTMX initiates SSE streaming with a single tag. When the server receives a SSE request from HTMX, it registers a goroutine and a new Go channel for this tab. The goroutine blocks and waits for new events in the channel. When something triggers a new message, there's a dispatcher which saves the event to the DB and then iterates over registered Go channels and sends the event to it. On a new event in the tab's channel, the tab's goroutine unblocks and passes the event from the channel to the SSE stream. HTMX handles inserting new data to the DOM. When a tab closes, the goroutine receives the notification via the request's context (another Go primitive), deregisters the channel and exits. If the server restarts, HTMX automatically reopens the SSE stream. It took probably one evening to implement.
We resolved this by creating a separate context for the lifecycle of a chat/turn so if the user leaves the page, the process continues on the server. UI calls an RPC to fetch in progress turn, which allows it to resume, or if it's done, simply render the full turn.
Wasn't that complex!
Assuming the traditional stateless routing of requests, say round robin from load balancers; how do you make sure the returning UI client ends up on the same backend server replica that's hosting the conversation?
Or is it that all your tokens go through a DB anyway?
It's fairly easy to keep an agent alive when a client goes away. It's a lot harder to attach the client back to that agents output when the client returns, without stuffing every token though the database.
You normally need to do that anyway. The specific backend host may have been destroyed in the meantime so you have to recover the context. And it's not like they're huge after compression.
It is honestly shocking how sloppy (pun intended) a lot of the online chatbot UIs are.
Its further fascinating how they're trying to sell coding tools and a future wgere these things arw integral
The SSE thing is a symptom of something bigger imo. These models are stateless but we often act like context windows are memory. Nothing around them actually remembers anything, and vector search doesn't fix it. I went down this rabbit hole recently: https://philippdubach.com/posts/beyond-vector-search-why-llm...
This is a feature of the web. Browser refreshes SHOULD dump state. Otherwise it can be difficult to recover from system errors. Of course if you can build a system that is guaranteed to never have bugs then go ahead and disable this feature. But users may still be confused as to why refreshing hasn’t restarted their window
It's interesting because this is a solved problem with collaborative docs.
CRDT or OT will work great but are even overkill. But so many of the edge cases you'd usually need to think about just disappear.
(I've built an agent / chat that used CRDT to represent the chat. You can have an arbitrary number of tabs, closing/opening at any time. All real time, in sync.)
I fixed that in my own front-end: https://github.com/rcarmo/vibes.
t3.chat solves this pretty well. I believe they utilize convex db. I think it’s something like a backend server process is the true connection and state of the chat. The front end syncs and receives updates from it.
> What are folks doing to get around it?
Some are using Google Gemini.
It saves your chats, which are presented in a pane you can expand on the left and search. You can jump back into any chat and continue it, or delete individual chats.
This history is attached to your Google account, not to the chat window. You can pick up an existing chat in another browser on another device where you are authenticated with the same Google identity.
Now about the specific use scenario in this article (hitting refresh immediately after submitting a prompt, while the response is coming). Not sure why that would be important?
I just tried it several times. Both times, it initially appeared as if the Gemini interface lost the chats, since they didn't appear in the chat history section of the left pane. But after another refresh, they appeared. So there is just some delay.
Anyway, it's good in this regard beyond giving a damn.
Lmao sorry but you completely missed the point of the article.
Yes of course all chat providers store your chats, and they will be available eventually when the response has finished streaming and has been dumped to a db.
This is about live streaming getting lost and not being reconnected (and restreamed) when you refresh the page.
And since chatting with AI and seeing the responses streamed is a major usecase, the author was correct to question why eg Anthropic wouldn invest some of the 30B in fixing this glaring problem.
Esp since it looks like your initial message was not received by the backend server at all!
It may not be super criticsl, but it's like saying "my ferrari sometimes shows the wrong speed. it's still driving, but the speedometer is stuck. it does get back to the correct speed eventually though, so no biggie"
So how do I repro the problem with Google Gemini?
1. Use thinking/pri 2. Ask it something that gets a streamed response 3. Hit refresh
A surprising amount of time the chat will bork itself by just alt tabbing but refreshing is a sure solution