Inserting 1.8M Rows/s from Pandas into QuestDB with Arrow, Rust and Cython
github.comHi, I'm the original author of the QuestDB Python client library and benchmark.
It all started when we had one of our users needing to insert quite a bit of data into our database quickly from Pandas. They had a dataframe that took 25 minutes to serialize row-by-row iterating through the dataframe. The culprit was .iterrows(). Now it's a handful of seconds.
This took a few iterations: At first I thought this could all be handled by Python buffer protocol, but that turned out to create a whole bunch of copies, so for a number of dtypes the code now uses Arrow when it's zero-copy.
The main code is in Cython (and the fact that one can inspect the generated C is pretty neat) with supporting code in Rust. The main serialization logic is in Rust and it's in a separate repo: https://github.com/questdb/c-questdb-client/tree/main/questd....