An Experiment in Purely Functional IO for Clojure

72 points by mikemarsh 11 years ago · 42 comments

Reader

The purpose of defining a function as pure is to allow the computer to make transformations. If you define println as a pure function and use pure to debug or trace your program you could discover that the compiler has eliminated some dead code and that some code has been executed out of order, so your debugging doesn't has any useful meaning. That's why is better to tell you that print isn't pure, the compiler could eliminate it if you define it as pure.

Disclaimer: I only has read the first comments.

kyllo 11 years ago

In Haskell this goes even deeper because the lazy evaluation strategy allows the compiler to optimize the order of evaluation of the expressions within a function and create thunks without you explicitly telling it to. Haskell needs to be pure in order to be lazy, because if it were impure it would be very difficult to reason about if, when, and in what order, the side effecting operations will take place. So like you said, if a side-effectful expression evaluates to nothing, its return value is not "needed" in a further computation, therefore it may not be evaluated at all.
This doesn't really apply to Clojure because Clojure is strictly evaluated--except when you're explicitly working with a lazy stream data structure. If you were to put side-effectful expressions inside of a lazy stream, you would have a hard time controlling when they are evaluated, especially since Clojure "chunks" lazy streams by default in groups of 32 as an optimization.
Here's a SO question demonstrating what happens when you mix laziness and side-effects and don't understand the implications--you find yourself trying to restrict the optimizations you allow the compiler to do (which will hurt performance): http://stackoverflow.com/questions/3407876/how-do-i-avoid-cl...

explorak 11 years ago

That first little example is the best explanation of a functionally pure language I've read.

escherize 11 years ago

I think I'm missing something: How is the function println not pure? It always returns nil (for inputs that it doesn't fail on).

Doesn't that mean it's pure?

Edit:

I'd define pure in this sense as the same thing as referentially transparent, meaning f(x) will always return the same thing for a given x.

noelwelsh 11 years ago
I find it most useful to define referential transparency in terms of substitution (which is the definition Wikipedia starts with). Namely: if I substitute a function call with its result is there an observable different in program behaviour? If there is, the program is not referentially transparent.
This is clearly the case with `println`.
```
   (println "Hi there")
```
is clearly not the same program as
```
   nil
```
as the former causes output to the console, while the latter does not -- an observable difference.
Using Scheme (because I don't know Clojure that well)
```
   (define (foo x)
     (let ([y 0])
       (set! y 10)
       (+ x 1)))
```
is a referentially transparent program. Although it contains a side-effect (the assignment to y) this effect is not observable outside `foo`, and thus any call to `foo` can be substituted with its result.
- badsock 11 years ago
  Ah, but
  println "Hi there"
  doesn't return nil, it returns a value of the type
  IO nil
  which can be transparently substituted for the call.
  - noelwelsh 11 years ago
    
    Yes, but you're writing in Haskell whereas my example is in Clojure. Hence the difference :)
    
    badsock 11 years ago
    
    Dammit! Sorry, carry on!
tel 11 years ago
Pure usually has a stricter definition. First we need the ability to state whether two functions are equal. Then we need a function which is constantly unit. Finally we need composition such that (f >> g) is "f then g". Now, a function f is pure if and only if
```
    constantly_unit = f >> constantly_unit
```
If you unpack that a bit it might translate as "if we throw away the return value of a function, it is exactly the same as if nothing is happening at all".
If your notion of equality differentiates "println" and "constantly_unit" then we cannot call "println" pure.
Note that this is a very powerful notion of purity. It's so powerful as to render Haskell impure as if we have the function
```
    loop x = loop x
```
then `loop >> constantly_unit` never returns and is therefore easy to distinguish from `constantly_unit` itself. This just drives home that non-termination is an effect itself!
- im3w1l 11 years ago
  
  Hmm I wonder if that loop example can be rehabilitated. If we interpret it, not as a description of how to evaluate loop, but rather as a constraint on loop. Then we see that the statement is simply a non-condition on loop.
  This would mean that non-termination is not a property of the function, but of the compiler/runtime, in that they failed to notice that loop was a partial function called with an input value for which it was not defined!
  - tel 11 years ago
    
    That's probably possibly in this particular case but it sounds a lot like you're heading toward Halting Problem territory here :)
- wz1000 11 years ago
  Ah, but Haskell is lazy so loop will never be evaluated and a
  loop >> constantly_unit
  will return unit no matter what.
  - tel 11 years ago
    
    Ah, yeah, I was being fast and loose with laziness. You have to `seq` the argument to `constantly_unit`, too.
gizmo686 11 years ago
The definition of purity in this post isn't quite correct. The common definition (which this post seems to be using) is that a pure function is one that has no side effects. This doesn't have much of an implication for reasoning about almost-pure functions like println, which have to effects on the program execution, but do cause some distinction. For example, with a pure function, the following are equivelent:
```
    var a=f(0); a=f(1)
    a=f(1)
```
however, these are not the same when f=println.
- zak_mc_kracken 11 years ago
  
  How so? println returns `Unit` so certainly, what you say above doesn't hold.
  Even if `println` returned a value, it would still be considered pure as long as that value is strictly calculated from its input parameter and that parameter alone (say, the size of the string).
  Another often used criterion to determine purity of a function is whether inlining it everywhere in your program produces a similar program. There again, `println` passes this test.
  The only way `println` could not be pure is if you call it with the same string twice and it returns a different value.
  - gizmo686 11 years ago
    
    Consider the following program.
    var a = println("Hello ") a = println(" World")
    This should output "Hello \n World\n" However, assuming println is pure, we can optimize this to just
    var a = println(" World")
    Which produces a different output. We could also convert:
    var a=println("Hello") var b=println("Hello")
    into
    var a=println("Hello") var b=a
    
    lgas 11 years ago
    
    My (quite possibly wrong) understanding is that purity has nothing to say about side effects. It's simply concerned with the inputs and outputs of functions, specifically whether the function will always return the same value given the same input.
    What zak_mc_kracken is saying is that under this definition println is obviously and trivially pure because it returns the same value for every input.
    You're correct that it won't have the proper side effects when optimized but under this definition that doesn't have anything to do with purity.
    Edit: If you want the "side effects" of the println statement to be considered part of the "output" then you want something like Haskell's semantics where the println statements are IO actions.
    Edit: zak_mc_kracken beat me to it.
    
    zak_mc_kracken 11 years ago
    
    You are correct under this particular interpretation of purity, there are just competing ways to assess said purity. Wikipedia only lists two (the one you just showed and the one I described) but is missing the third one (the inlining approach).
    In Haskell, all functions in the IO monad are by definition pure, and that includes all the println functions.
  - agumonkey 11 years ago
    
    That would be discarding the effect of print, since inputs aren't just ignored, but used to modify the system somehow somewhere. Replace print* by set!, `:=` if that helps.
    Yes the relationship between input and output is clearly defined, though. Your point of view raises a good point about precision when talking about purity and functions.
ane 11 years ago

Well, if somebody breaks the kernel so that printing doesn't work, it will most likely return something strange, in the case of Clojure, an exception.
A pure function always returns the same result because it only depends on its input parameters. An impure IO function does not.
- escherize 11 years ago
  
  I see that if there's a kernel panic, or power outage that f(x) might return something insane or nothing at all, but those aren't things I'm usually worried about. What's the practical upside to this?
  Is breaking the kernel something I should actually consider likely?
  - astrange 11 years ago
    
    Compiler optimization are allowed to change how often a pure function is called, and you can freely change it in your code. If it had a printf inside it, you'd need a way to say you don't care about the side effect of printing a specific line one time, and not zero or two times.
    Of course there is lots of code that has side effects in implementation but not in their interface. Like malloc, or a read only data structure that has an internal cache it updates.
sgk284 11 years ago
> It always returns nil (for inputs that it doesn't fail on).
> Doesn't that mean it's pure?
Unfortunately not because it also modifies global state (the state of the console). A pure function implies that it is referentially transparent, meaning that:
```
    a = foo()
    b = a + a
```
Is the exact same as:
```
    b = foo() + foo()
```
But if foo() modifies global state, then this statement is not true. In this case the difference is printing something twice vs. once. If a function is pure, a compiler can optimize the second example into the first example.
For a function to be pure, it must always return the same value given the same input and not modify any state that is observable outside of its definition.
bhrgunatha 11 years ago

Since println is just an example, perhaps it isn't really a good example to use when trying to give an example of how impure functions can have side effects. Sending some ink to a printer or altering pixels on a screen isn't exactly a compelloing argument for side effects.
Maybe a better example function is one that deletes data from your hard drive, or sens control signals to a robot that locks the doors or changes the A/C or starts some industrial process, releasing toxic chemicals.
The point is println does have a side effect, it's just not a very interesting one.
Or maybe the definition of side effect is what is causing the problem, after all - executing any function does have physical side effects - electrons move, electromagnetic styate chabnes, ambient temperature changes. Maybe the problem is where we draw the line for observable side effects.
T-R 11 years ago

Surprised no one's brought up Haskell's Debug.Trace, which effectively does this, but returning whichever value you tell it to.

yason 11 years ago

And why would you want to redo the idiomatically complex part in Haskell (there are probably thousands of monad tutorials just to explain why I/O in Haskell needs to be wrapped in a monad) also in similarly a complex way in Clojure which, however, could actually handle I/O and side-effects in a controlled way just fine without making an (academic) mess out of it?

You need to use monads to get around Haskell's limitations imposed by the decision to target complete purity just to do simple things like I/O but no matter what I/O is still not pure (something like readline can never be pure even if it wanted to) and thus you're, in one way or in another, forced to separate the static, pure and functional parts of your program and its dynamic part with side-effects.

Haskell does it with monads which, as a concept in itself, is a generic way to reason about state, but IMHO the exactly best part of Clojure is that it offers several ways to manage dynamic state in a controlled way without forcing you to go 100% pure or 100% impure. Why break that, except as a mental exercise?

tel 11 years ago

This is so misinformed it isn't even wrong.
* IO doesn't have to be wrapped in a monad, there are other models
* Clojure cannot handle IO/side effects very well (compared to any language with effect typing)
* Input and output are not pure, but the `IO` type itself is pure. Rather, constructions of the IO type are pure and then the runtime can interpret IO values impurely.
* The entire point is to separate the impure from the pure. It's not that you're forced to, it's that you desire to.
* Monads are not, in themselves, generic ways to reason about state. They are far more general.
* Anything that is not 100% pure is 100% impure. Without purity guarantees you cannot trust code you call upon to not do side effects. This breaks local reasoning.
- taeric 11 years ago
  
  I can't help but disagree with your final bullet. You won't have "contractual and checked by the compiler" trust in code you call upon, but it is quite common to trust the code that you call in any language to do just what it claims it will do.
  Consider, do you really think that code was lacking local reasoning before the likes of Haskel? It is arguable that things were more difficult then, but the argument is still out that things are easier now.
  - tel 11 years ago
    
    Local reasoning is absolutely impossible unless you have some kind of contract which ensures that your (local) code is pure. This is almost definitional.
    What I haven't claimed is that no other fragment of code in another language can be pure. In nearly any language (1+1) is pure. My point was that literally any impurity inside of a fragment of code makes the whole thing impure (in most cases) and therefore destroys local reasoning.
    The "in most cases" bit above is important because there are ways to "purify" a code fragment so that code which uses it cannot witness the impurity inside and therefore it restores local reasoning "above" that level. The ST monad is such an example.
    
    taeric 11 years ago
    
    I don't think "absolutely impossible" means quite what you think it means. Again, I will make no claims that it is easy. And in some cases you may wind up with some global reasoning entering into the coding process.
    Honestly, with how many solutions I've seen with tons of "locally pure" parts that were a bloody mess to deal with, maybe some "global" reasoning is called for.
    
    tel 11 years ago
    
    I'm willing to consider that I might be wrong, but here's the argument.
    If I am looking at some code which includes computation (by which I include function calling but also reference access which is sometimes trickily ignored as a computation) then I cannot assess the behavior of this code without knowing either (a) the computation is side-effect free and therefore has a mere value semantics or (b) it is not and can potentially be affected by or affect non-local parts of the code.
    To hit case (a) I don't need a language which enforces purity, but I do need to know that everything "beneath" where I'm standing is pure. In this case, uncertainty, even tiny amounts of it, whittles away (a) entirely and leaves me in concern (b).
    I'm not saying that global reasoning is bad or infeasible, but I am saying that lacking purity you cannot trust local reasoning until you isolate the pure fragment. For instance, you might state that (!x + !y) involves the global reasoning of what the values of (x) and (y) are but is local reasoning otherwise. I'd argue that actually local reasoning is destroyed until you refactor this code as
    let x_value = !x in let y_value = !y in (x_value + y_value)
    where the parenthetical fragment is now pure and local as the side effects were sidelined into the let clauses.
    I previously wasn't saying that "locally pure" code is preferable to globally reasoned code. I'm not completely certain that I would say that in all cases. I feel very confident though that it's (a) the right default and (b) something that should be used to a far greater degree than most code I see written which more or less demands global reasoning to do anything non-trivial at all.
    
    taeric 11 years ago
    
    First, I want to say that your last paragraph is something I do agree with. Sounds like we are ultimately on the same page and do actually agree with each other.
    My point was simply that local reasoning is strengthened by trust in everything that you do locally. This is actually no different than living. I trust that what I hand off for recycling is actually getting recycled correctly. I have no real verification of this, however.
    Now, you can work in a language that demands this for you. However, there are times where this demand actually makes things more difficult than they need to be. Conversely, there are plenty of times where not honoring this idea leads to annoyance.
    Again, I do agree with your final point. I'm just not clear on where empirical results lie on this. Too much of it is just a very compelling argument.
    
    tel 11 years ago
    
    I suppose I'm being a bit of pedant, but in my mind if you have to trust that some other actor (the recycling company) will do something then you're not actually talking about local reasoning but instead, exactly, global reasoning.
    The local reasoning in this situation is you putting the refuse in the bin and placing it outside. All of that is "pure", completely in your control, and relies on exactly no side effects or outside state. It's also trivially testable, nearly failure proof, and completely observable. The "locality" of this implies that you need only consider exactly the things which are "in scope" at this moment and their behavior is entirely circumscribed by your "local" scope.
    The moment you rely on an outside party whose capabilities rely on outside state then you lose all of those guarantees.
    From a certain, high-enough level we can have "local reasoning" again in that the state of the municipal recycling service is encompassed. Or perhaps we also need to include the world oil supply in that model, who knows?
    So, I'm being pedantic around the word "local reasoning". I think that's valuable because the kind of reasoning which is local is sharply distinct from that which isn't and it confers a lot of great properties. Finally, I'll reiterate, that I think side effects of any form utterly wreck local reasoning.
    
    taeric 11 years ago
    
    I get what you are saying. I was really just picking on the "absolutely" part of what you were saying.
    Consider, I can absolutely use local reasoning to determine where trash should be to know that the truck driving by will pick it up and take it away. In that sense, I have done my small part and all decisions are locally reasonable. At a global scale, they may not be enough. And more measures may be needed, but not much breaks down on my doing my part.
    Same for a program. I can reasonably be sure that calling println will not cause my machine to break, and will leave a note somewhere I can find it. Doesn't matter if this println is in the middle of a loop or not.
    
    tel 11 years ago
    
    Heh, I just think you and I have different ideas about what "local reasoning" should mean. I cannot personally call your examples anything but very global.
    
    taeric 11 years ago
    
    Only when talking about the entire system. In which case, yes I fully advocate for more global reasoning. Above and beyond any considerations of purity, evidently. :)
noelwelsh 11 years ago

Following up _delirium's post, there are advantages to the Haskell approach: separating structure from interpretation of computer programs (pun very much intended) and compositionality of said programs. I talked a bit about this first property in the context of the free monad here: http://underscore.io/blog/posts/2015/04/14/free-monads-are-s...
It is certainly worth exploring other paradigms to understand the advantages and disadvantages they bring.
_delirium 11 years ago

I don't think the post is really arguing what you're arguing against here. It says it's doing an "experiment", for "fun and learning purposes". It's not advocating removing side-effectful I/O from Clojure, just looking to see if Haskell-style I/O is possible in Clojure. I don't think it's really too "academic" for someone to do something for fun and learning, and post about it on the internet...
whateveracct 11 years ago

You're the top post of this HN thread but you're making sweeping statements of fact about Haskell and other topics you clearly don't have a working understanding of. You should edit your comment so as not to mislead people about the truth. The statement about monads being tied to state is particularly wrong and confusing to the uninitiated.
- tjradcliffe 11 years ago
  
  But the whole point of Haskell is to make claims that confuse the unitiated, isn't it? Because that's excactly what every tutorial that claims "Haksell is a pure langauge" is doing when it then goes on to say "Haskell can do I/O" without explaining that unlike every other language the unitiated reader has ever encountered, Haskellers have chosen to define "the langauge" in such a way as to exclude "the runtime".
  I really don't get why this point is not made clear in all Haskell discussions up-front, but instead Haskellers insist on repeating the empty mantra "Haskell is a pure language" as if they were using the term "language" in the standard way rather than redefining it in a carefully constructed way so as to make their claim true.
  This is terrible pedagogy on par with people who introduce negative generalized temperature or resistance without first explaining that they are generalizing the concept. And it obscures one of the coolest aspects of Haskell, which is that it is an impure language (using "language" in the standard way, which always includes the runtime) that has very cleverly packaged its impurity such that reasoning about the code still gets most of the advantages of purity.
  "Haskell has a pure syntax that is interpreted by an impure runtime" is more accurate and far less confusing.
  - arielby 11 years ago
    
    C/C++ don't have much of a runtime too (libc's "runtime" component is often not used, and libstdc++'s is rather controversial). Standard Haskell functions (with types like `a -> b`) are pure, unlike (most) other mainstream-ish languages. Of course you can do IO, which of course is impure, but it is cleanly separated from non-IO.

Settings

An Experiment in Purely Functional IO for Clojure

Keyboard Shortcuts