Tabular data is the frontier – graphs can help

5 points by madman2890 2 months ago · 2 comments · 2 min read

Even in 2026, most of the work in tabular predictive AI still has very little to do with the model itself. Whether you're using CatBoost, XGBoost, or newer tabular foundation models like TabPFN, the .fit() step is usually the smallest part of the workflow.

The real time sink is everything before that. Most real-world predictive problems live across many relational tables. So the majority of the work ends up being:

• Discovering which tables are actually relevant

• Understanding foreign keys and entity relationships

• Figuring out cardinality (1:1, 1:N, N:M)

• Aggregating child tables into meaningful features

• Handling time windows and leakage

• Integrating everything into a single training table

Only after all of that can you actually train the model. In many projects, 80–90% of the effort is spent on data discovery and multi-table aggregation, while the modeling step itself takes minutes.

Tabular foundation models reduce the amount of tuning required, but they don’t remove the fundamental need to collapse relational data into a single learning table. The bottleneck in tabular AI has always been the data graph, not the model.

Graphreduce is a project I've been incrementally building for a few years that addresses the real problem in tabular predictive AI: data prep

https://wesmadrigal.github.io/GraphReduce/

amazonbezos 2 months ago

This is definitely where most of the time is spent - very cool!

madman2890OP 2 months ago

Glad you find it useful.

Settings

Tabular data is the frontier – graphs can help

Keyboard Shortcuts