Fast DataFrames for Ruby
github.comRecent and related:
Modern Polars: A comparison of the Polars and Pandas dataframe libraries - https://news.ycombinator.com/item?id=34275818 - Jan 2023 (62 comments)
Also:
Polars: Fast DataFrame library for Rust and Python - https://news.ycombinator.com/item?id=29584698 - Dec 2021 (124 comments)
Polars: Rust DataFrames Based on Apache Arrow - https://news.ycombinator.com/item?id=23768227 - July 2020 (1 comment)
I think its really interesting such gems offering a ruby layer on top of rust libs. One issue that I have with that is - and maybe it's my ignorance - but is that necessary to bundle the original lib as https://github.com/ankane/polars-ruby/tree/master/ext/polars ? I can imagine that makes easier to avoid breakage, or with C extensions, because you dont have some dependency manager around, but couldn't we sort it with Cargo? For instance, having the version locked and it could download (and cache) the dependency when necessary?
ankane’s gems are truly something.
Blazer is a particular favourite of mine. A former colleague taught herself SQL from zero knowledge by looking at and piecing together bits of other reports, experimenting with familiar (interesting) data and going on to build dashboards her team loved.
If we'd have just given the team a locked down Data Studio/PowerBI report none of that would have happened. Encouraging people to peek under the hood can be a huge benefit.
came here to say this.
polars is a really great library. Cool to see it expanding into so many languages too.
I'm trying to imagine why somebody would start a data analytics project in ruby.
Having done financial modelling and data analytics in Ruby: Because I like Ruby, all the other backend code in those projects was in Ruby, and most projects don't rely on data volumes where the lack of something like Polars is an issue to begin with.
Most people don't have large datasets (even many people who think they have large datasets). Some do, or require more complex supporting libraries, and I get that Ruby then often isn't practical for them, and that's fine.
But it's nice to know I now have one more option reducing my need to consider another language.
- Because some like it better than Python/Julia/<INSERT NEXT language>. - Because they want data analytics in a ruby application - Because Ruby is awesome
Because you already have a Ruby project and you want add analytics to it?
Other than the lack of library/tools comparable to Python (hence projects like this one), why not?
Honestly I used Ruby about 6 years ago, but have been a python guy ever since. That said I believe Ruby's main advantage is its metaprogramming capabilities. You can build powerful DSLs in Ruby pretty quickly. Adding analytics to that could be useful in certain cases.
I'd like to see spark bindings first though before I would seriously consider it.
http://ondra-m.github.io/ruby-spark/
https://github.com/ondra-m/ruby-spark
Hasn't been updated in a long time - no idea if that's because it's complete enough or if it's been abandoned, as I don't use Spark so I haven't tested this gem myself.
and why not?
Why not why ?
Presumably to use polars.
I'm trying to imagine why somebody would start writing so hateful comment on hn. how empty the soul of that human can be?
I didn't read it as hateful, but as legitimate curiosity. Ruby is at a disadvantage because the analytics ecosystem is primarily Python based.
Imagine if people never tried anything new simply because they were starting from a disadvantage! In machine learning this would be called a "local maxima".
I can see where they are coming from.
The use of the words "I'm trying to imagine" is really very provocative, because it suggests that the commenter has thought hard about it comprehensively eliminated any possible reason one would use Ruby. Which in turn implies there is literally nothing good about Ruby at all.
So someone has poured heart and soul into building a free library in their own time, giving it away to everyone and the response is to casually dismiss it with a remark suggesting there is literally no reason for it to exist.
Is it a big deal? no. But it would be good if people tried to be kind when commenting.
There is a school of thought that you should use the right tool for the job, and some languages are better tools for certain tasks. Python already has all the libraries, and Julia has built-in language support. But nobody is stopping you from using Ruby or JS or PHP for whatever.
> There is a school of thought that you should use the right tool for the job
But there are often real world constraints that influence the choice of the tool. For example if you have a Rails app already, going with more Ruby code might fit better than branching out to Python or Julia.
> and some languages are better tools for certain tasks.
Yes, although in the case of Python, it's not the language that is better but the eco system support.
The right tool for the right job needs to take into consideration what languages/tools people know and have been exposed to.
We process billions of events/records weekly with just Ruby and Sidekiq/Redis at my current job. Its way easier to extend what we have with Ruby than switching to Spark/Python/Scala/Kafka or whatever etc is complete overkill.