Show HN: RethinkDB change feeds for indexing Algolia
github.comThanks for sharing, it's nice to see an example of using RethinkDB with Algolia.
RethinkDB originally added changefeeds for exactly this use case, long before hitting on the idea of leveraging the feature for realtime application development. It's pretty cool to see users intuitively picking up on the suitability of changefeeds for this sort of integration.
We've been using it for years to keep our database in sync with Elasticsearch, it works great compared to the oplog hack that was required with MongoDB.
At my work we use Rethink changefeeds to accomplish literally the same task. We use changefeeds to filter and pipe data to Algolia and numerous other services. Works like a breeze and takes surprisingly little code to get up and running.
I wonder how you deal with restarting changefeeds? The last time I checked you'd have go through every document again after losing the connection to rethinkdb or restarting the server.
We use changefeeds more or less of a queue/pipeline and don't care too much about the initial state. When the changefeeds are created we specifically don't pass the includeInitial argument [0] so we only get a stream of newly modified/created documents.
[0]: https://rethinkdb.com/docs/changefeeds/javascript/#including...
In a slightly different use case than what OP is describing, we keep track of createdAt and updatedAt in Rethink, order by those, and pick off from max(createdAt) in destination in order to fake restarting the feeds.
Is the changefeed reliable enough to make your search engine depend on this?
In the RethinkDB website, they have a warning saying that changefeeds cannot guarantee delivery.