The 2023 Metabase Community Data Stack Report

4 min read Original article ↗

Earlier this year, we released an anonymous data stack survey, through social channels and email, to find out more about data tooling and its impact on different company sizes and roles.

The survey was available to anyone but, out of the 189 responses we received, 89% were Metabase customers.

While we can’t say our insights are a statistically significant representation of data folks everywhere because of the small sample size, the results have a few things you may want to know, like how one specific database could be worse for your team’s morale... Keep reading to find out more.

Blue Star

Copy to clipboard Larger companies are more likely to choose a data tool if it's open source

Explore the dashboard

75% of survey respondents said they’re using an open-source production database, so it’s no surprise that Postgres and MySQL were mentioned most often throughout the survey. But one surprise: larger companies choose an open source production database more often than not.

Large companies said open source, over performance, scalability, and security, was the deciding factor in choosing their production database.

On trend, 75% of respondents at larger companies said dbt is currently a part of their data stack. Open source was also in their top three reasons for choosing a data modeling tool.

Open source was in every corner of the survey results, which is not surprising given that our community rallies around all open-source tools, not just BI.

Beige S

Pink Hourglass

Copy to clipboard Even with all of the options on the market, most companies still keep data ingestion in-house

Airbyte and Fivetran rounded out the top three, but in-house data ingestion was still more popular than the two combined.

Explore the dashboard

Maybe legacy architecture forces people to build in-house ingestion tools. Or the cost of third-party tooling outweighs the benefits.

It could also just be that third-party ingestion tools are still growing, so maybe we’ll see a shift to them in the coming year.

But a good amount of companies are still choosing to build their own ingestion pipelines. We’ve seen a similar trend in data cataloging (more on that below).

You can keep those Python scripts handy for now. In-house data ingestion seems poised to stay as a complement to commercial offerings versus being replaced entirely by third-party ingestion tooling.

Mustard Ellipsis

Gray Glider

Copy to clipboard Postgres is the most satisfying database... even more if you're on a distributed team

Although it’s one of the most widely used database in the industry, MySQL had the lowest role satisfaction score out of the three most commonly used analytics databases.

Explore the dashboard You may want to rethink your database... and your return to office policy, too. Those happiest in their role said they use PostgreSQL in a distributed team setting.

If you’re using MySQL and have opposing opinions to share, we’re all ears. As for our theory on MySQL's lower score: it's a battle-hardened database, but maybe MySQL is keeping older (less fun) codebases afloat.

Postgres users also said their companies are more self-serve than users of other analytics databases, so it may be a wise option if you’re a global, fully remote team.

Explore the dashboard

Orange Star

Copy to clipboard Self-service score was higher for distributed teams, but there was one role that scored different than the rest

People working on distributed teams said their companies are more self-serve than localized teams. Distributed companies need self-service tools and processes to work asynchronously and let workers query on their own time. This is pretty straightforward.

Explore the dashboard

But from the results around employee satisfaction, there is one large caveat. Perceptions of self-serve differ by role.

People in data roles perceived their companies as less self-serve than their C-level and Engineering counterparts. Explore the dashboard

It’s not surprising that C-Levels and Engineers see their companies as more self-serve. They’re the ones using self-serve tooling.

These results could mean that self-serve is doing as it was intended to do. It could also mean that Data Analytics folks think their companies aren’t as self-serve as they hoped for. There isn’t a huge variation here, but it’s good to keep an eye on.

The good news is we can let you know if that changes! Fill out the survey below to help us figure it out.

The future of the data stack survey

The data stack survey is still open. You can submit your answers now via the form. We’ll create follow-up posts on new, interesting findings as they roll in.

The dashboard and this report are static data for you to use. If you do use the data for something cool, make sure to share it with us!

Submit your answers Explore the dashboard