What Is a Feature Store?
tecton.aiQuick question: What's the reason for making "Transform" part of the Feature Store definition. I've been evaluating a couple of feature stores (incl. Tecton and Feast - great job by the way willempienaar) and I'm wondering if that doesn't complicate things. Especially if you already have your own data processing pipes.
Kevin here, Tecton's CTO. Great question:
With Tecton, transformations are an optional component of the system. Similar to Feast, you can bypass the Transform component to ingest data directly from external pipelines. You typically do this when you have existing data pipelines and you want to make the values available in a Feature Store. However, if you don't have an existing stream / batch data pipeline infrastructure that your data scientists / data engineers can easily contribute to, a Feature Store's Transform component is an easy way for them to be fully self-sufficient. Tecton makes it easy to express feature transformations using Spark's native DataFrame API, Python, SQL or Tecton's DSL.
Besides the self-sufficiency, there are a few other advantages you get from having a feature store manage your feature transformations:
- Feature Versioning: If you change a feature transformation, the Feature Store will know to increment the version of that feature and ensure that you don't accidentally mix features that were computed using two different implementations
- End-to-end lineage tracking and reproducibility: If a feature store manages your transformation, it can tie exact feature definitions all the way through a training data set and a model that's used in production. So, if years later you want to reproduce a model of a certain time in the past, a Feature Store that supports transformations would be able to recreate that model as long as the raw data still exists
- Trust: It's more likely that a data scientist will trust and then reuse another user's feature, if they can peek under the hood and see how the feature is actually calculated
- On-Demand Features: These transformations cannot be executed by existing data processing pipelines because they have to be computed in real-time when the prediction is made — which happens in the operational environment.
In reality, you will frequently see multi-stage data processing workflows in an organization: You will have a lot of data cleaning and preprocessing happening in an organization's standard and ML-independent data processing infrastructure. Afterwards, a Feature Store will pick up and transform that preprocessed data and turn it into feature values.
Great job on the blog. I had couple of questions.
- I know feast doesn't include transform component today, are there plans to include it in future releases?
- Curious to know, what does Tecton looks like in a multi-tenant environment? is it a deployment per tenant model?
Co-author here. Happy to answer any questions!
Are there any docs for Tecton?
Hey, Tsotne from Tecton here. Tecton docs are currently only available to early access users, but if you hit the [Request Free Trial] button on the site you can sign up for early access and we'll reach out to you with more info!