How to track your dbt project’s BigQuery data warehouse costs (a PoC)

3 min read Original article ↗

Dave Flynn

Do you want a way to track your BigQuery usage costs? Check out this interesting proof of concept for dbt.

Data warehouse usage isn’t getting any cheaper and, as dbt continues to gain in popularity, cloud warehouse usage is on the up, with more and more transformations done in the cloud. dbt allows you to select which resources to build, which can reduce cost, but the actual cost that is incurred with each dbt build still isn’t immediately obvious. As data engineers, we want to adopt new technologies such as dbt, without the finance dept calling and asking why data bills are through the roof.

One of my colleagues, Alan, at InfuseAI/PipeRider, made this interesting proof of concept for a simple cost calculator for dbt projects running in BigQuery.

Press enter or click to view image in full size

It works by recording the bytes billed from BigQuery to a file after dbt build finishes. The cost of the build can then be calculated based on the current cost of BigQuery ($5 per TB).

Here’s an example of the content of usage.json, which contains information about the project, query, bytes billed etc.

{
“dbt-project-_name”: “jaffle_shop”,
“bytes-processed”: 792,
“bytes_billed”: 10485760,
“location”: “US”
“project _id”: “your-project”,
“job_id”: “abc-123”
“slot_ms”: 52,
“message”: “SELECT (1.0 rows, 792.0 Bytes processed)”
“service_account_email”: “yout-account-dev.iam.gserviceaccount.com”
“current_user”: “username”
},

After this file is generated, it’s possible to run a quick calculation to determine the cost of the build. Alan used a alias command called ‘show_me_the_money’ that ran the following command line trickery:

show_me_the_money='cat usage.json | jq '\''[.[].bytes_billed] | add / 1000000000000 * 5'

It’s a simple but useful PoC, especially if you have large, or multiple, projects and want to keep track of how much your builds are costing, and avoid any surprises when your BigQuery invoice comes in.

Currently the proof of concept exists as a patch for the dbt BigQuery connector, available here:

This will output the usage.json file, that you can then use with the calculation above.

If you’re interested in using this let us know by leaving a comment and we can work on making this more a formal adapter.

PipeRider is your code review tool for dbt projects

PipeRider is built specifically for assessing the impact of data model changes during dbt data project development.

Use dbt during development to generate and compare data profiles and ensure the changes you are making to data models are desired.

Then, when you open a pull request on your team’s data project with changes, attach the data change summary to help the reviewer see how your changes will affect the project.

Install or update PipeRider with the following command:

pip install -U 'piperider[<datasource_connector>]'

All the main data warehouses are supported. Check the docs for more info.