Show HN: Refinery – Data anonymization and governance for the masses
https://www.openquery.io/refinery
We are Christos, Damien and Nodar, founders of OpenQuery. We are building Refinery, an Open Source deployment written in Rust to automate the process of anonymizing sensitive data and governing how different consumers access that data using our 'policy-as-code' framework (It's based on Hashicorp's HCL).
We noticed that companies have been solving the same problem over and over in different ways. The problem boils down to giving a data-consumer (this could be an analyst, a BI tool, an ML workflow, a different company etc. ) access to a sensitive dataset. Sensitive data can come in different shapes and sizes, maybe it's spatial data, maybe it's financial information, maybe it's sensitive patient health data. There is also the context in which this data is being shared - is the data crossing national borders? Is it leaving the organisation?
Refinery allows you to easily define how data-consumers access sensitive data, you can add users, groups, datasets and then define entities within datasets. Next, leverage multiple anonymization algorithms (from basic hashing and bucketing to differential privacy) and policies (built for spatial data or financial data etc.) to define how consumers interact with the data. Our query engine will then enforce that policies are upheld at runtime, so for a data-consumer it feels like they are interfacing with a plain old SQL database. Furthermore, having the configuration as code allows you to use your favourite editor and version control system to manage your policies.
We haven't launched the product yet - we are deep in development, launching soon and very excited! We would love to hear your ideas, questions and feedback. If this sounds like something that could be helpful to you please get in touch! maybe if BA was using that, my credit card info wouldn't get stolen by magecart