Show HN: Open-Source Infrastructure for Vector Data Streams
github.comPurpose-built for low-latency applications, Retake syncs vector stores with their sources of truth. Think semantic search for e-commerce listings, merchant or receipt matching in fintech, etc. I’ve been looking for something like this: eventually consistent syncing of DB content -> embeddings in a vector DB. So far, I’ve been dealing with a tradeoff between latency + error handling in my API endpoints. I’ll either 1.) embed content + upsert into to the vector DB inside a transaction block for my main DB in the handler, which kills latency, or 2.) kickoff the embedding work separate from the main handler work, which risks data desynchronizing. I’d much prefer a set-it-and-forget-it approach like Retake. A few questions: * If the “real-time server” goes offline temporarily, will it catch up on any newly added rows in the interim? * Do you intend to emit any OpenTelemetry metrics? I’d like to monitor lag in production. * Will I be able to deploy this as a single container on ECS/Kubernetes? * If the “real-time server” goes offline temporarily, will it catch up on any newly added rows in the interim? Yes, we're built on top of Kafka * Do you intend to emit any OpenTelemetry metrics? I’d like to monitor lag in production. We don't have that yet, but open to it * Will I be able to deploy this as a single container on ECS/Kubernetes? Yes