Generalizing Support for Functional OOP in R

109 points by samch93 2 years ago · 52 comments

Reader

To be honest, OOP never really seemed like a good fit for R. Functional programming is a much more natural fit, given that both it and R come from a mathematical point of view. R is a great language for the mathematical/statistical stuff it was invented to do, but I don't think it will ever be a general-purpose language, and it probably would become worse at its core purpose if it tried to.

More work on being easily used by/incorporated into applications written in other languages, would perhaps be a more impactful thing to work on.

orhmeh09 2 years ago

OOP has been a critical part of real-life R for a long time, especially in complex implementations of classes of kernels, algorithms, and so on with S4, and more general-purpose with R6. Without these frameworks it would be difficult to implement them.
Personally I find it more expressive for general-purpose computation than Python. The "fs" library is much better at working with files and paths than Python "os" and the multiple other modules that can be needed to work with with typical filesystem operations -/ especially if you are working with more than one file at a time.
I would even say that each of the R object systems is more expressive and more flexible than the Python one. I suspect lazy evaluation is a part of this.
- mr_toad 2 years ago
  
  Maybe too expressive and too flexible? Different R programs can have wildly different dialects, making it difficult for two R programmers to even understand each other.
  I’ve seen comics depicting the learning curve for R as having local minima beyond which there are further peaks and troughs of knowledge. A beginner might learn enough to get by, but find the code of someone on the other side of one of those peaks to be a foreign language.
  Having your coders not understand each other is problematic in a production environment.
  - orhmeh09 2 years ago
    
    This is fair. For what it's worth, Python is tending toward this and I think is introducing newer syntax at a faster rate with things such as structural pattern matching and typing, which I have had difficulty explaining to people who don't keep up with each new release.
    
    stevenae 2 years ago
    
    Switching from R to Python, this resonates. I write base (non-tidy) R, and it's definitely another language from tidy. That said, having written a fair amount of base Python, jumping into torch/tensor flow feels like an even further separation than base R/tidy.
- barfbagginus 2 years ago
  
  Pathlib gives a decent oop interface to most file operations. The pathlib docs has a handy Rosetta Stone table for replacing `os` and `shutil` invocations which is very handy.
  It could still be better, and sadly being in the stdlib means it probably won't be improved.
  For example, we still need shutil.rmtree to recursively nuke a directory. The pathlib way of doing it is laborious and error prone: https://stackoverflow.com/questions/13118029/deleting-folder...
  - orhmeh09 2 years ago
    
    Pathlib is a great addition and it's nice that it's in the standard library. I like fsspec also, which has some functionality that overlaps with Python's existing libraries but makes it a little cleaner IMO.
- fire_lake 2 years ago
  
  I too much prefer R to Python (it’s far more expressive, for one) however it’s clear now that Python has “won” in this space and R is a tough sell to a wider team.
  - bachmeier 2 years ago
    
    > it’s clear now that Python has “won” in this space and R is a tough sell to a wider team
    R has always been a language for academics, and it continues to be popular in that domain, with no compelling reason to switch. It has seen usage in the private sector, but that has never been the driving force behind R's development or ecosystem, and I doubt it ever will be. For academics, even if a particular function is only available for Python, it's easy enough to call it from R and do everything else in R.
  - rossdavidh 2 years ago
    
    My experience is that you can "sell" R when the statistical or modelling technique is not (or not well) implemented in python. Which still includes a lot of potentially useful statistical techniques! R should/could lean into being the tool that python programmers reach for when they need something a little less mainstream, if they made it easier for non-R programmers to do so.
  - orhmeh09 2 years ago
    
    Oh, I agree on that, and I know that most of the time I'll be asked why something isn't in Python so I mainly reserve it for when I am sure nobody will ask "why R?" :-)
    There is a benefit nowadays that I can rely on Python>=3.6 to be available by default anywhere I am deploying whereas R has to be installed in some way, so like Bash it's part of a toolbox I can rely on being available with at least a constrained set of features.
- agumonkey 2 years ago
  
  > I suspect lazy evaluation is a part of this.
  I had no idea R was lazy. Makes me wanna learn it now.
  - int_19h 2 years ago
    
    R is such a weird little language. It's basically lazy Lisp dressed up in C syntax.
    For example, the only operator it really has is a function call. Everything else is syntactic sugar for a function call, and I mean literally everything: assignments, conditionals, loops, even function definitions and curly braces are all function calls. For example:
    a <- 1; # is the same as `<-`(a, 1); if (a == 1) print("ok"); else { print("wtf"); print(a); } # is the same as `if`(a == 1, print("ok"), `{`(print("wtf"), print(a))); function(x, y=42) x + y; # is the same as `function`(pairlist(x=, y=42), x + y); # not quite but close enough
    and so on. You can actually see what things look like under the hood for any R expression or statement by printing as.list(quote(...)), and recursively doing that for every element in the resulting list
    The reason why this is possible is because all arguments in R are not evaluated when passed to a function. Instead, it receives the expression object corresponding to the expression that the caller used for that argument, combined with the environment in which it was created - R calls this a promise. It's kind of like instead of (foo (+ x y)), you'd write:
    (foo (`(+ x y) (lambda () (+ x y)))
    i.e. for the argument, instead of its evaluated value, you passed both the quoted expression and the lambda that computes it in the original environment. When the actual value of the argument is needed, the expression is evaluated and the result is cached in the promise (so implicit eval is lazy and one-off). But the function can instead just query for the argument expression directly and then use it in some other way - so e.g. the `<-` function does not eval its first argument, but instead uses it to identify the variable being set.
    
    klibertp 2 years ago
    
    Thanks for a great explanation. It looks like the thing Scala does with its by-name parameters, but for every parameter by default. Even closer analogy, I think Io works in a very similar way - bodies of methods can access their arguments as Message (ie. unevaluated calls) objects and then decide to evaluate them as needed (which differs from your example in that the body can choose the context in which the message send is to be executed, it doesn't have to be lexical scope of a caller). It enables a great deal of expressivity - esp. coupled with some syntactic sugar for "operators" - and I always wondered why more languages don't have that feature.
    
    int_19h 2 years ago
    
    In R, you can also choose the context in which the argument expression is to be evaluated. If you just use the promise as if it were a value and rely on implicit evaluation, then it happens in the context of the caller, yes. But environments (i.e. sets of name-value bindings) in R are first-class objects, so they can be captured at any given point, and later used to explicitly evaluate promises after retrieving the latter's associated expression.
    foo <- function(x, env) { print(x); # implicit eval x_expr <- substitute(x); # gets the associated expression print(eval(x_expr, env)); # explicit eval in different environment } bar <- function(y) { environment() # capture and return local environment of function } y <- 1 env <- bar(2); foo(y * y, env); # prints 1 then 4.
    Side note: substitute() seems like a weird name for a function that returns the underlying expression of the promise. It's named that way because it's actually similar in intended use to quasiquotation - it lets you explicitly substitute variable names for something else in the expression before evaluating it. So e.g. substitute(x <- x + 1, x=2) returns the expression object for (1 <- 1 + 2). Not passing any named arguments is just a special case where no substitutions are made and the original expression is returned instead, although in practice that is probably the most common way to use it.
    
    orhmeh09 2 years ago
    
    Oh, I loved using Io in the late 2000s (and did all my algorithms assignments in it, probably to the chagrin of the instructor who allowed us to use any language we wanted). Maybe that explains some of my affinity toward R. Is there anything else useful that is similar to it these days?
  - civilized 2 years ago
    
    R is not lazy. It has non-standard evaluation mechanisms (formulas, promises, quosures...) that enable to you to write domain-specific languages that "do what the user meant".
    If your code (or the code of the libraries you're using) doesn't use any non-standard evaluation tools, evaluation will be eager and work like any other ALGOL language.
    It is possible to make some objects behave in a lazy way, but this is also true of many other languages.
    
    int_19h 2 years ago
    
    R is lazy, because it does not evaluate arguments upon function call unless and until used. It is also unusual in that the function can avoid evaluating the argument at all, and instead ask for the quoted expression that produced it (which can then be evaluated manually at the desired point, or multiple times, or in a different environment etc), but that is orthogonal to laziness.
    To be even more precisely, R itself is lazy "all the way through". Because literally every expression in R is syntactic sugar for a function call (including assignments and control structures such as "if"), the only thing that a function can really do with an argument is pass it on to another function, so, strictly speaking, there's no distinction between use and non-use even. It's just that any R function, in order to do something useful, will ultimately call some non-R leaf function implemented in native code, and some of those leaf functions will actually do the eval if they're defined in terms of argument values (e.g. obviously addition needs to do so to actually compute the value etc).
    
    aquasync 2 years ago
    
    R's evaluation of arguments is lazy, so while not at the level of Haskell it feels like a lazy language to me. Try eg:
    f = function(x) { print('hello'); x } f(print('world'))
    X is not evaluated in f until referenced. Indeed if you remove x from f, world is not printed.
    
    civilized 2 years ago
    
    Apologies, my bad, but I'm a bit too late to edit. The experts say that R qualifies as a lazy language [1, 2].
    My impression was that R was mostly an eager language that somehow allowed for laziness. I will research this further and hopefully suss out why I got confused.
    [1] https://dl.acm.org/doi/10.1145/3360579
    [2] https://www.r-bloggers.com/2018/07/about-lazy-evaluation/
    
    orhmeh09 2 years ago
    
    Thank you for the correction. Is it possible to use NSE in say Python or JavaScript?
    
    CornCobs 2 years ago
    
    Yes, though the languages do not support it explicitly you can simulate lazy evaluation by wrapping all your arguments in closures. This way they won't be evaluated until called within the function body.
    
    agumonkey 2 years ago
    
    Sidenote, the evaluation model of python can be surprising. List comprehension will create implicit function scopes that can trip you up.
- int_19h 2 years ago
  
  In modern idiomatic Python, you should really be using `pathlib` and `io`, not `os`.
  - orhmeh09 2 years ago
    
    Yeah, you're absolutely right. The `os` methods are closest to the base R file operations and those in `pathlib` are closest to `fs` (which is a third-party library for R that requires installation).
andrewla 2 years ago

OOP as used in R is very much a function of API design and not a function of routine R usage for data analysis. To many users of R they are not even aware that they are using OOP at all, especially for the S3 style of objects.
When you have an object, like `model <- lm(x~y)` or `my_hist <- hist(df$foo)`, you expected to be able to `plot` it or get a `summary`; you don't call `my_hist.summary`, you call `summary(my_hist)` and `plot(model)`. Many users never look further under the hood than this. And this fits nicely into piped workflows -- `lm(x~y) |> summary()` ends up being very natural, and when you fit in the tidyverse operators many very complex workflows end up being very easy to digest.
But when you do pull back the kimono it gets ugly fast. The teams involved in this are the right people who have been working to make R an amazing language mostly through enhancements to libraries, and now they're trying to push some of that functionality back into core R, which I think is fantastic.
- th0ma5 2 years ago
  
  Some detail about that phrase https://www.catalyst.org/2021/03/22/racism-misogyny-asian-am...
  - chuckadams 2 years ago
    
    Pretty sure the phrase GP was looking for was "pull back the curtain", which likely originated with The Wizard of Oz.
  - Onawa 2 years ago
    
    I can honestly say that I had never heard that phrase used before now, but I do know I felt icky when I read it in the comment before I even clicked on your link. Definitely glad to see it is being called out, terms like this absolutely need to be removed from modern discourse.
- rossdavidh 2 years ago
  
  Interesting! To prove your point, I was unaware that I was using OOP in R when I did those kinds of things. But, it feels more like functional programming, and I wonder if "OOP" is even a good way of describing that, if that's what they mean?
  - andrewla 2 years ago
    
    There are really only two situations where it matters. One is in developing tooling; you have to understand how all this works together so that the user can do things like call `summary` or `plot` on your objects.
    The other is when you are trying to debug why something isn't working -- a lot of the time you can dump an R function just by typing the name, so you can run `table` with no parameters to get the `table()` function, if you're trying to figure out why it's not working right for your data. But if you execute `plot` you'll get some thing saying "UseMethod("plot")" which feels a bit recursive -- "in order to plot, plot", and then you end up going down that rabbit hole, which leads you to contributing to R packages, developing some of your own, and eventually posting on HN about how R's class system works.
- clatan 2 years ago
  
  In other words OOP can be great for tooling, but doesn't make much sense for what R is meant to be used for -interactive analysis- in every day work.
  R's mess of OOP systems works great, S3 is "fine" for just dispatching 'methods' based on attributes, one doesn't even know it's happening in base R ALL the time.
  R flexibility also makes it possible to build your own class system. i.e. modern ggplot2 has its own ggproto object system.
PheonixPharts 2 years ago
Functional programming is not orthogonal to object oriented programming, they are paradigms that can be used together and the popular object system developed in Java is not close to the only way to do OOP.
R, like Common Lisp, uses an OOP system based on generic functions (well, one of many OOP systems in R, but that's a different topic), where the function handles dispatching to match the object.
Effectively instead of:
```
    object.method()
```
you have:
```
   method(object)
```
So it works perfectly with the functional paradigm while still being capable of everything an object system is.
The most obvious example of this in R is the `plot` function. You as the programmer don't have to know exactly how to plot an instance of a class, you just pass it into `plot` and it will be handled correctly. If you create a new class you just have to extend the definition of plot in a standard way and it will also be handled for you.
It's a shame that the many flavors of OOP remain relatively unknown to most programmers, and in many of the cases where they're tried (JavaScript's prototype system for example) people have essentially replaced them with more familiar systems.
- epgui 2 years ago
  
  > Effectively instead of object.method() you have method(object)
  This reflects a very deep misunderstanding of the distinctive characteristics of each paradigm. It's so far off that it's "not even wrong".
  Functional programming is much more about things like purity and referential transparency, about composing functions and/or combinators, about a particular way of managing or modelling effects, about a way of thinking, about using certain kinds of data structures and algorithms.
  It's not a syntactical difference.
  - fn-mote 2 years ago
    
    >> Effectively instead of object.method() you have method(object)
    > This reflects a very deep misunderstanding of the distinctive characteristics of each paradigm.
    I think the parent made a hasty reading of the GP comment. The GP shows an awareness of multiple OO systems in R.
    I believe the GP is attempting to explain to a Java programmer how R could be considered object-oriented even though `plot(item)` does not "look like" what you would see in an object oriented system.
    Which is to say: there is an generic function dispatch based on the type of the first argument to the function. This can be _used_ to write in an OO style.
  - int_19h 2 years ago
    
    What GP is saying is that while Java is object-centric, R (and CLOS etc) is method-centric: you don't have classes with multiple methods, you have generic functions with multiple methods (each of which implements that function for particular argument types): http://adv-r.had.co.nz/OO-essentials.html#s3
    This is not about functional programming at all; the distinction here is completely orthogonal to that.
bachmeier 2 years ago

> More work on being easily used by/incorporated into applications written in other languages, would perhaps be a more impactful thing to work on.
That's basically a solved problem. For instance, RInside opens a C interface that can be called by any language that can call C functions, which is basically every language. It's efficient, too, because you're only passing pointers around. Here's an example in Ruby (disclaimer that I wrote it): https://github.com/eddelbuettel/rinside/blob/master/inst/exa...
usgroup 2 years ago

Yet you’re using OOP every time you call plot, predict, summary and so on: possibly without realising.
- CornCobs 2 years ago
  
  I actually think this "multi-tiered" system of OOP is quite cool, when compared to languages that stick OOP in your face upfront.
  1. Basic users don't even know it's there, they're just calling regular functions.
  2. S3 in base is super simple to understand and easy to extend the first time you need to implement your own summary.
  3. Full blown OOP with slots and methods is available when you really need it (rare for a user and not library author imo, lists and S3 are sufficient for most things).
  The big issue I see is the incompatibilities in the various systems making this "ramp up" not so smooth. But it looks like that's what S7 is trying to address so that's cool.

CornCobs 2 years ago

Something interesting I realized about their choice of name - S7.

1. It's a combination of S3 and S4 obviously.

2. It's also linked to another OOP system called R6. Interesting how it's a step forward one way (6->7) and a step 'backwards' in another way (S->R).

To me it shows the philosophy of not creating something entirely new but improving the existing systems quite nicely!

clatan 2 years ago

I trust the authors immensely but i don't see what yet another class system in R solves. That's on me, but I'd like to understand more of what motivates this effort.

usgroup 2 years ago

From the article: “S7 is a new OOP system being developed as a collaboration between representatives from R-Core, Bioconductor, tidyverse/Posit, ROpenSci, and the wider R community, with the goal of unifying S3 and S4 and promoting interoperability.”
It then goes on to describe what that means in depth.
- clatan 2 years ago
  
  I can read, thak you, and no it doesn't.
  It describes 3 new generics in base R that help their new S7 system.
  It all seems motivated by better interop with python which is 'neat' but really doesn't seem like a critical necessity of the language. I guess it's more of a tactical thing where they're trying to make it easier for python users to eventually try R. Or for R users that work alongside python users to not abandon R.
  - t-kalinowski 2 years ago
    
    The Python interop is in the blog post because it makes for convenient and compact examples, not because it motivated any of the features.
    If you're interested in what motivated S7, you may enjoy this talk Hadley gave: https://www.youtube.com/watch?v=P3FxCvSueag (R7 was the working name for the package at the time)

dmead 2 years ago

This seems like a huge mess. Why is it so hard for R people to settle on a standard?

jcheng 2 years ago

There's a few reasons I can think of.
First, R definitely needs at least two, one for functional OOP (Common Lisp style) and one for class-based OOP (Java style). The latter is _much_ less important for everyday R users but as a package author it's extremely helpful for modeling certain types of resources. (Interestingly, Python also ships with two: @singledispatch and classes; and multimethod/multidispatch also exist.)
Second, because R's basic language building blocks are so flexible, it's relatively easy to build new OOP systems, resulting in more diversity.
Third, I believe it's actually been close to thirty years since S4 was introduced, which was the last functional OOP system until S7. I don't think that's a terrible track record, compared to how much variety you see in equally fundamental systems in other language communities (just off the top of my head, Python: packaging standards, environment management, data frames; JavaScript: module systems, runtimes, package managers).
mr_toad 2 years ago

https://xkcd.com/927/

kgwgk 2 years ago

Nice. This is a bit disappointing though: “Multiple dispatch is heavily used in S4; we don’t expect it to be heavily used in S7, but it is occasionally useful.”

hadley 2 years ago

Why is that disappointing?
- kgwgk 2 years ago
  
  Maybe disappointing is not a good way to describe it but I couldn’t find a better word. I meant that it seems that multiple dispatch won’t be highlighted.
  I would have liked to see a better S4 - fixing some of its issues and adding things like before/after/around method - and I’m not sure this goes in that direction. It can still be an improvement in practice over the rarely-used S4 though.

Settings

Generalizing Support for Functional OOP in R

Keyboard Shortcuts