Show HN: AstroBee – AI-Generated Semantic Layers
app.astrobee.aiHi,
I’m a cofounder of AstroBee and I wanted to share our work with the community.
AstroBee is an automatic semantic layer generator for your business. It brings data together from different locations, storing it either in your data warehouse or in one we host. Then, AstroBee scans your data and models it to create an integrated source of truth (we call it an ontology because it’s structured like Palantir’s ontology). Once you have your source of truth, you can either build applications on top of it, or chat with directly to answer analytics questions.
If you don’t like AstroBee’s original suggestions, you can provide your own context and define your semantic layer as you wish. Then, AstroBee will hydrate your semantic layer definition your data.
We imagine AstroBee can become the data layer for AI-generated internal applications. While it’s easier than ever to spin up light internal apps with “vibe-coding” tools, getting an integrated and reliable data layer to service those applications is more complicated.
You can try it out on demo data or upload your own if you want. Would love your thoughts and feedback if you’re willing to try it out! Congrats on the launch! The quickstart with the demo data was easy to follow. I've wanted a tool like this where I [1] can self-serve and test hypotheses about our data models without being an expert in data engineering (complex joins, multiple sources, entity resolution, normalization, etc) and [2] don't have to wait for days to hear back from our data team. Looks like it does both, nice. Now time to connect to some sample BigQuery and GA data and see if it's as easy as the demo.. thank you! that's exactly where we're going with this. we're still in the early days, so let us know how you get along! This really isn’t an ontology. Palantir’s marketing around “ontology” is misleading — what they (and you here) are describing is closer to a semantic layer or curated data model. An ontology in knowledge engineering is an open-ended graph of concepts and relationships, typically expressed as triplets (subject–predicate–object). It’s flexible and domain-agnostic. that's a fair point. we use "ontology" throughout the product because of the influence from Palantir, and the fact that the structure we use in the product (from palantir) is in fact a bit different from what most people would create in semantic layers like dbt or cube but i agree it's confusing. we are considering changing the naming around to account for this. i appreciate the feedback! Calling it an ontology might be confusing indeed — nevertheless it’s definitely valuable to have an automatic way to generate and hydrate a semantic model from raw data Interesting proposal. I'm having a bit of trouble understanding the difference between 'warehouse' and 'connect source'. Could you guys explain the differences between them for me? Or point me to where I can read the documentation about these? yep if you want to just go straight to the docs, they're here: https://docs.astrobee.ai/ warehouse: you connect your warehouse AND astrobee will store your semantic layer (ontology) in your warehouse. it will also use your warehouse to do computation connect source: you'll use our warehouse under the hood, and when you connect your source systems, data will end up being stored in our warehouse. thats where computation and ontology storage will happen What model(s) / providers are you using? Are you training on the data that the agent gets access to? Seems like there are some data governance and privacy red flags for anything involving remotely sensitive data... we're using OpenAI's API for business. they don't train on data sent to the business api, unlike the consumer tier this is still an early beta, so at the moment everything is only available with OpenAI's API. however, for people who want to use it in a higher security environment, we'll support switching OpenAI with any hosted model API including on-premise or models held in private VPCs. that way people can manage their data with no exfiltration to a third party been an early user of the product for a while now! primarily using it to get high level product analytics trends in my product, nothing crazy but I'm a solo open source dev and it saves me time from worrying about setting up data pipelines/or dashboards for my project. look forward to using the new ontology work to get deeper insights. also would be great to see some improvements in the agent's memory capabilities - e.g. i don't have to explain what my definition of a "power user" is every convo, but loving it so far : )