Parquet and ORC's many shortfalls for machine learning, and what to do about it?
starburst.ioThis article summarizes research from my lab in collaboration with ByteDance published in CIDR (a computer science conference held in Amsterdam two weeks from now) on a new columnar format designed for ML workloads.