Ruby Polars
🔥 Blazingly fast DataFrames for Ruby, powered by Polars
Installation
Add this line to your application’s Gemfile:
Getting Started
This library follows the Polars Python API.
Polars.scan_csv("iris.csv") .filter(Polars.col("sepal_length") > 5) .group_by("species") .agg(Polars.all.sum) .collect
You can follow Polars tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
Reference
Examples
Creating DataFrames
From a CSV
Polars.read_csv("file.csv") # or lazily with Polars.scan_csv("file.csv")
From Parquet
Polars.read_parquet("file.parquet") # or lazily with Polars.scan_parquet("file.parquet")
From Active Record
Polars.read_database(User.all) # or Polars.read_database("SELECT * FROM users")
From JSON
Polars.read_json("file.json") # or Polars.read_ndjson("file.ndjson") # or lazily with Polars.scan_ndjson("file.ndjson")
From Feather / Arrow IPC
Polars.read_ipc("file.arrow") # or lazily with Polars.scan_ipc("file.arrow")
From Avro
Polars.read_avro("file.avro")
From Iceberg (experimental, requires iceberg)
Polars.scan_iceberg(table)
From Delta Lake (experimental, requires deltalake-rb)
Polars.read_delta("./table") # or lazily with Polars.scan_delta("./table")
From a hash
Polars::DataFrame.new({ a: [1, 2, 3], b: ["one", "two", "three"] })
From an array of hashes
Polars::DataFrame.new([ {a: 1, b: "one"}, {a: 2, b: "two"}, {a: 3, b: "three"} ])
From an array of series
Polars::DataFrame.new([ Polars::Series.new("a", [1, 2, 3]), Polars::Series.new("b", ["one", "two", "three"]) ])
Attributes
Get number of rows
Get column names
Check if a column exists
Selecting Data
Select a column
Select multiple columns
Select first rows
Select last rows
Filtering
Filter on a condition
df.filter(Polars.col("a") == 2) df.filter(Polars.col("a") != 2) df.filter(Polars.col("a") > 2) df.filter(Polars.col("a") >= 2) df.filter(Polars.col("a") < 2) df.filter(Polars.col("a") <= 2)
And, or, and exclusive or
df.filter((Polars.col("a") > 1) & (Polars.col("b") == "two")) # and df.filter((Polars.col("a") > 1) | (Polars.col("b") == "two")) # or df.filter((Polars.col("a") > 1) ^ (Polars.col("b") == "two")) # xor
Operations
Basic operations
df["a"] + 5 df["a"] - 5 df["a"] * 5 df["a"] / 5 df["a"] % 5 df["a"] ** 2 df["a"].sqrt df["a"].abs
Rounding
df["a"].round(2) df["a"].ceil df["a"].floor
Logarithm
df["a"].log # natural log df["a"].log(10)
Exponentiation
Trigonometric functions
df["a"].sin df["a"].cos df["a"].tan df["a"].arcsin df["a"].arccos df["a"].arctan
Hyperbolic functions
df["a"].sinh df["a"].cosh df["a"].tanh df["a"].arcsinh df["a"].arccosh df["a"].arctanh
Summary statistics
df["a"].sum df["a"].mean df["a"].median df["a"].quantile(0.90) df["a"].min df["a"].max df["a"].std df["a"].var
Grouping
Group
Works with all summary statistics
Multiple groups
df.group_by(["a", "b"]).count
Combining Data Frames
Add rows
Add columns
Inner join
df.join(other_df, on: "a")
Left join
df.join(other_df, on: "a", how: "left")
Encoding
One-hot encoding
Conversion
Array of hashes
Hash of series
CSV
df.to_csv # or df.write_csv("file.csv")
Parquet
df.write_parquet("file.parquet")
JSON
df.write_json("file.json") # or df.write_ndjson("file.ndjson")
Feather / Arrow IPC
df.write_ipc("file.arrow")
Avro
df.write_avro("file.avro")
Iceberg (experimental)
df.write_iceberg(table, mode: "append")
Delta Lake (experimental)
df.write_delta("./table")
Numo array
Types
You can specify column types when creating a data frame
Polars::DataFrame.new(data, schema: {"a" => Polars::Int32, "b" => Polars::Float32})
Supported types are:
- boolean -
Boolean - decimal -
Decimal - float -
Float32,Float64 - integer -
Int8,Int16,Int32,Int64,Int128 - unsigned integer -
UInt8,UInt16,UInt32,UInt64,UInt128 - string -
String,Categorical,Enum - temporal -
Date,Datetime,Duration,Time - nested -
Array,List,Struct - other -
Binary,Object,Null,Unknown
Get column types
For a specific column
Cast a column
df["a"].cast(Polars::Int32)
Visualization
Add Vega to your application’s Gemfile:
And use:
df.plot("a", "b", type: "line")
Supports line, pie, column, bar, area, and scatter plots
Group data
df.plot("a", "b", group: "c", type: "line")
Stacked columns or bars
df.plot("a", "b", group: "c", type: "column", stacked: true)
Plot a series [unreleased]
Supports hist, kde, and line plots
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/ruby-polars.git cd ruby-polars bundle install bundle exec rake compile bundle exec rake test bundle exec rake test:docs