Settings

Theme

ELT Schedules Can Improve Root Cause Analysis for Data Engineers

montecarlodata.com

7 points by swordsmith8 3 years ago · 1 comment

Reader

swordsmith8OP 3 years ago

At Monte Carlo, we did some work on root cause analysis for data failures, like ETL job failures, timeouts, data delays, etc. I think there's a lot that can be done from a data science perspective to automate RCA, or provide better insights into data pipeline problems.

We put together this blog post, showing how an orchestration DAG (like a dbt schedule DAG) can be converted into a Bayesian network. You can then ask causal attribution questions in the form of conditional probability queries against the BN. The idea is still pretty basic / preliminary, but I think it could be extended in all sorts of interesting ways e.g. attributing bad row-level data to upstream transformations, etc.

Would be interested to hear what people think.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection