🔥 Blazingly fast DataFrames for Ruby, powered by Polars
Installation
Add this line to your application’s Gemfile:
Getting Started
This library follows the Polars Python API.
Polars.scan_csv("iris.csv") .filter(Polars.col("sepal_length") > 5) .group_by("species") .agg(Polars.all.sum) .collect
You can follow Polars tutorials and convert the code to Ruby in many cases. Feel free to open an issue if you run into problems.
Reference
Creating DataFrames
From a CSV
Polars.read_csv("file.csv") # or lazily with Polars.scan_csv("file.csv")
From Parquet
Polars.read_parquet("file.parquet") # or lazily with Polars.scan_parquet("file.parquet")
From Active Record
Polars.read_database(User.all) # or Polars.read_database("SELECT * FROM users")
From JSON
Polars.read_json("file.json") # or Polars.read_ndjson("file.ndjson") # or lazily with Polars.scan_ndjson("file.ndjson")
From Feather / Arrow IPC
Polars.read_ipc("file.arrow") # or lazily with Polars.scan_ipc("file.arrow")
From Avro
Polars.read_avro("file.avro")
From Iceberg (experimental, requires iceberg)
Polars.scan_iceberg(table)
From Delta Lake (experimental, requires deltalake-rb)
Polars.read_delta("./table") # or lazily with Polars.scan_delta("./table")
From a hash
Polars::DataFrame.new({ a: [1, 2, 3], b: ["one", "two", "three"] })
From an array of hashes
Polars::DataFrame.new([ {a: 1, b: "one"}, {a: 2, b: "two"}, {a: 3, b: "three"} ])
From an array of series
Polars::DataFrame.new([ Polars::Series.new("a", [1, 2, 3]), Polars::Series.new("b", ["one", "two", "three"]) ])
Attributes
Get number of rows
Get column names
Check if a column exists
Selecting Data
Select a column
Select multiple columns
Select first rows
Select last rows
Filtering
Filter on a condition
df.filter(Polars.col("a") == 2) df.filter(Polars.col("a") != 2) df.filter(Polars.col("a") > 2) df.filter(Polars.col("a") >= 2) df.filter(Polars.col("a") < 2) df.filter(Polars.col("a") <= 2)
And, or, and exclusive or
df.filter((Polars.col("a") > 1) & (Polars.col("b") == "two")) # and df.filter((Polars.col("a") > 1) | (Polars.col("b") == "two")) # or df.filter((Polars.col("a") > 1) ^ (Polars.col("b") == "two")) # xor
Operations
Basic operations
df["a"] + 5 df["a"] - 5 df["a"] * 5 df["a"] / 5 df["a"] % 5 df["a"] ** 2 df["a"].sqrt df["a"].abs
Rounding
df["a"].round(2) df["a"].ceil df["a"].floor
Logarithm
df["a"].log # natural log df["a"].log(10)
Exponentiation
Trigonometric functions
df["a"].sin df["a"].cos df["a"].tan df["a"].arcsin df["a"].arccos df["a"].arctan
Hyperbolic functions
df["a"].sinh df["a"].cosh df["a"].tanh df["a"].arcsinh df["a"].arccosh df["a"].arctanh
Summary statistics
df["a"].sum df["a"].mean df["a"].median df["a"].quantile(0.90) df["a"].min df["a"].max df["a"].std df["a"].var
Grouping
Group
Works with all summary statistics
Multiple groups
df.group_by(["a", "b"]).count
Combining Data Frames
Add rows
Add columns
Inner join
df.join(other_df, on: "a")
Left join
df.join(other_df, on: "a", how: "left")
Encoding
One-hot encoding
Conversion
Array of hashes
Hash of series
CSV
df.to_csv # or df.write_csv("file.csv")
Parquet
df.write_parquet("file.parquet")
JSON
df.write_json("file.json") # or df.write_ndjson("file.ndjson")
Feather / Arrow IPC
df.write_ipc("file.arrow")
Avro
df.write_avro("file.avro")
Iceberg (experimental)
df.write_iceberg(table, mode: "append")
Delta Lake (experimental)
df.write_delta("./table")
Numo array
Types
You can specify column types when creating a data frame
Polars::DataFrame.new(data, schema: {"a" => Polars::Int32, "b" => Polars::Float32})
Supported types are:
- boolean -
Boolean - decimal -
Decimal - float -
Float16,Float32,Float64 - integer -
Int8,Int16,Int32,Int64,Int128 - unsigned integer -
UInt8,UInt16,UInt32,UInt64,UInt128 - string -
String,Categorical,Enum - temporal -
Date,Datetime,Duration,Time - nested -
Array,List,Struct - other -
Binary,Object,Null,Unknown
Get column types
For a specific column
Cast a column
df["a"].cast(Polars::Int32)
Visualization
Add Vega to your application’s Gemfile:
And use:
Supports line, pie, column, bar, area, and scatter plots
Group data
df.plot.line("a", "b", color: "c")
Stacked columns or bars
df.plot.column("a", "b", color: "c", stacked: true)
Plot a series
Supports hist, kde, and line plots
History
View the changelog
Contributing
Everyone is encouraged to help improve this project. Here are a few ways you can help:
- Report bugs
- Fix bugs and submit pull requests
- Write, clarify, or fix documentation
- Suggest or add new features
To get started with development:
git clone https://github.com/ankane/ruby-polars.git cd ruby-polars bundle install bundle exec rake compile bundle exec rake test bundle exec rake test:docs