So DataFrames are basically spreasheets on steroids, or in-program database tables, but its potential as an Entity Component System for game development was never adequately explored.
It’s fair to say that Data Science/Machine Learning people pour orders of magnitude more resources into improving DataFrame libraries like Pandas than efforts towards all ECS implementations combined. It will probably continue to be the case.
(We’d be showing code using Polars/cuDF/Pandas, which range from “easier to get good performance out of but harder to code game logic with” to “easier to code game logic with but taking more effort to get good performance out of”, but their ideas are more similar than different.)
Using DataFrames is straightforward. A row can be thought of as an entity. A column represents a component (or part of a component, depending how you’d like to encode it). Columns(i.e. components) can be added dynamically. Each column can be a struct containing multiple fields.
import pandas as pd
import pyarrow as pasize = 8
df = pd.DataFrame(
{
"id": range(0, size),
}
)
xyz_dtype = pd.ArrowDtype(
pa.struct([("x", pa.float32()), ("y", pa.float32()), ("z", pa.float32())])
)
# not the most efficient way to do this, but it's just for demonstration
df = df.assign(
position=pd.Series(
({"x": id * 2.0, "y": id * 3.0, "z": id * 5.0} for id in df["id"]),
dtype=xyz_dtype,
),
velocity=pd.Series(
({"x": id * 7.0, "y": id * 11.0, "z": id * 13.0} for id in df["id"]),
dtype=xyz_dtype,
),
)
print(df)
id position velocity
0 0 {'x': 0.0, 'y': 0.0, 'z': 0.0} {'x': 0.0, 'y': 0.0, 'z': 0.0}
1 1 {'x': 2.0, 'y': 3.0, 'z': 5.0} {'x': 7.0, 'y': 11.0, 'z': 13.0}
2 2 {'x': 4.0, 'y': 6.0, 'z': 10.0} {'x': 14.0, 'y': 22.0, 'z': 26.0}
3 3 {'x': 6.0, 'y': 9.0, 'z': 15.0} {'x': 21.0, 'y': 33.0, 'z': 39.0}
4 4 {'x': 8.0, 'y': 12.0, 'z': 20.0} {'x': 28.0, 'y': 44.0, 'z': 52.0}
5 5 {'x': 10.0, 'y': 15.0, 'z': 25.0} {'x': 35.0, 'y': 55.0, 'z': 65.0}
6 6 {'x': 12.0, 'y': 18.0, 'z': 30.0} {'x': 42.0, 'y': 66.0, 'z': 78.0}
7 7 {'x': 14.0, 'y': 21.0, 'z': 35.0} {'x': 49.0, 'y': 77.0, 'z': 91.0}As you can see everything is very dynamic and flexible. Arbitrary components can be added at runtime via adding a column, or removed by setting a cell to null, which is arguably a lot more flexible than most ECSs. This dynamic approach should also make DataFrames easier to be manipulated from scripting languages. Save/load can also be trivial.
The strongest strength of DataFrames over traditional ECSs is probably query ergonomics. Someone challenged me to do the following query using DataFrames, which involves quite a lot of relationships between entities:
// find all spaceships
SpaceShip($spaceship),
// that are of a faction
Faction($spaceship, $spaceship_faction),
// that is docked to an entity
DockedTo($spaceship, $planet),
// which is a planet
Planet($planet),
// that is ruled by a faction
RuledBy($planet, $planet_faction),
// which is allied by the spaceship's faction
AlliedWith($spaceship_faction, $planet_faction)DataFrames handle such query with ease. Once we represent the world in DataFrames like this,
use polars::df;
use polars::prelude::*;
use std::fmt::format;#[derive(Debug, Clone, Copy)]
enum EntityType {
Spaceship = 0,
Faction,
SpaceStation,
Planet,
}
pub(crate) fn query() {
// Create a DataFrame for entities
let entities = df![
"entity_id" => &[1, 2, 3, 4, 5, 6],
"entity_type" => &[
EntityType::Spaceship as i32,
EntityType::Faction as i32,
EntityType::SpaceStation as i32,
EntityType::Planet as i32,
EntityType::Spaceship as i32,
EntityType::Faction as i32,
],
"owning_faction" => &[Some(2), None, Some(2), Some(2), Some(6), None]
].unwrap();
println!("Entities DataFrame:");
println!("{:?}", entities);
// Create a DataFrame for docking access
let docking_access = df![
"from_faction_id" => &[2, 6],
"to_faction_id" => &[6, 2]
].unwrap();
println!("Docking Access DataFrame:");
println!("{:?}", docking_access);
// Create a DataFrame for docking status
let docking_status = df![
"spaceship_id" => &[1, 5],
"target_id" => &[3, 4]
].unwrap();
println!("Docking Status DataFrame:");
println!("{:?}", docking_status);
// Create a DataFrame for friendly stances
let friendly_stance = df![
"from_faction_id" => &[2, 6],
"to_faction_id" => &[6, 2],
"is_friendly" => &[true, true]
].unwrap();
println!("Friendly Stance DataFrame:");
println!("{:?}", friendly_stance);
The actual query becomes just a few lines of code, with a query optimizer at runtime helping you get adequate performance without worrying too much:
// LazyFrame query to find all spaceships docked to a planet owned by a different friendly faction
let entities_lazy = entities.lazy();
let docking_status_lazy = docking_status.lazy();
let friendly_stance_lazy = friendly_stance.lazy(); let result = docking_status_lazy
.join(entities_lazy.clone(), [col("target_id")], [col("entity_id")], JoinType::Inner.into())
.filter(col("entity_type").eq(lit(EntityType::Planet as i32)))
.join(entities_lazy.clone(), [col("spaceship_id")], [col("entity_id")], JoinType::Inner.into())
.join(friendly_stance_lazy, [col("owning_faction_right")], [col("from_faction_id")], JoinType::Inner.into())
.filter(col("owning_faction").eq(col("to_faction_id")))
.select([col("spaceship_id")])
.collect()
.unwrap();
println!("Spaceships docked to a planet owned by a different friendly faction:");
println!("{:?}", result);
}
The more I look at DataFrames and compare them to ECS solutions, the more I feel like they overlap with each other, and there is too much progress on DataFrames side that ECS folks are completely missing, and that ECS folks are fighting a losing battle against it.
In a future article we will discuss how DataFrames choose between write-in-place versus doing a copy-on-write, and what you can do to improve the performance when dealing with DataFrames.