Nilang.jl – A Reversible Julia DSL

64 points by Fishysoup 6 years ago · 30 comments

Reader

Incredible. Given a (one-to-one) Julia function `f`, this package generates `~f`, the inverse of f. Clever use of automatic differentiation.

Paper: https://arxiv.org/abs/2003.04617

ssivark 6 years ago

Is this in essence the same as the relation between a differential equation and it’s adjoint equation, and how one could use the asking method to perform back propagation? (Eg: ref. Neural ODEs)
- GiggleLiu 6 years ago
  
  NiLang requires every instruction being reversible, e.g. `SWAP` and `y += f(args...)`. The back-propagation is implemented on a limited instruction set. It is different with Neural ODE because neural ODE does not back propagate the program step by step. Some time reversible integrator can utilize the reversibility (with NiLang) to save memory. e.g. the leap frog integrator for symplectic systems.
thisrod 6 years ago

That can't be entirely true. There must be some kind of continuous, convex or monotonic precondition on f. Something to stop me from implementing a cryptographic hash function, and asking for the solution to a 10000 bit challenge that is supposed to require a trillion times Earth's entire computing resources running for a million years.
Or maybe it's impossible to implement that hash function reversibly without preserving a copy of the input in the output.
- cbkeller 6 years ago
  
  I think it's a variant of the latter -- IIUC no information can be "lost" in reversible computing, so the output of a reversible hash function might be both the hash and some number of other outputs, all of which you would have to know if you wanted to reverse the hash function.
  It seems there is an analogy to be drawn to the way in which there can be no "waste heat" in a reversible thermodynamic process [1]. I thought this analogy might be a bit of a stretch at first, but looking into this a bit more it seems as though this is indeed exactly the idea with reversible computing, such that if a reversible computer could be implemented at the hardware level there would supposedly be significant energy efficiency gains on the table [2-4].
  [1] https://en.wikipedia.org/wiki/Reversible_process_%28thermody...
  [2] https://arxiv.org/abs/1702.08715
  [3] https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/INC19-...
  [4] https://spectrum.ieee.org/computing/hardware/the-future-of-c...
- jlokier 6 years ago
  
  > Or maybe it's impossible to implement that hash function reversibly without preserving a copy of the input in the output.
  Cryptographic hash functions rely on locally non-reversible non-linear operations.[1]
  So you don't need to copy the original input, but you will need to copy bits here and there throughout the hash function, which allow the input to be reconstructed.
  But that's not surprising, even a humble AND gate is non-reversible. So even a humble AND gate program will need to copy its inputs to the output if it's to be reversible.
  [1] (As well, cryptographic hash functions are generally not reversible anyway. As in, different inputs hash to the same output value, so there's no unambiguous reversal.)
  - thisrod 6 years ago
    
    > So even a humble AND gate program will need to copy its inputs to the output if it's to be reversible.
    It's been 20 years since I had to think closely about reversible computing, and I've forgotten lots of the details. When I said "implement a hash function", I actually meant "implement exclusive-or with the result of the hash function" or something similar.
    > As well, cryptographic hash functions are generally not [invertible] anyway.
    Good point. But aren't there one-to-one equivalents, called one-way functions, trapdoor functions or something like that? My cryptography is even rustier.
    
    jlokier 6 years ago
    
    > It's been 20 years since I had to think closely about reversible computing, and I've forgotten lots of the details. When I said "implement a hash function", I actually meant "implement exclusive-or with the result of the hash function" or something similar.
    Well that gets a bit more interesting. If the hash is of a key K, and the hash output is exclusive-or'd with input M, then of course it's reversible with respect to M but not K.
    You can think of the above as an easily reversible program with M as its only input. (If K is treated as a constant, then it's just part of the program.).
    In that case, yes you can make an efficiently reversible version in Nilang.jl, and it will have all the differentiability properties etc.
    But it will only have those properties under the assumption of holding K constant. The new program won't resemble the stated problem of inverting a cryptographic hash :-)
    In general there will be some parts of a program which are reversible without needing to record any state, and other parts where you have to retain some state (or generally a function of the state, which might be less overhead) in order to make the whole thing reversible.
    Quantum computers face this problem too. Every quantum program is fully reversible, but you can't copy qubits, so you can't translate a classical non-reversible program to quantum by retaining copies of state as described above. You have to do stranger things to translate the program.
    The fact you can actually do that shows that recording internal states isn't the only way to make a program reversible. But the fact it takes exponential time to simulate the resulting program on a classical computer shows that the alternative doesn't let you invert a cryptographic hash that easily either :-)
    > But aren't there one-to-one equivalents, called one-way functions, ...
    In general one-way functions map multiple inputs to the same output, so they aren't reversible.
    You're thinking of a rarer kind called one-way permutations.
    There aren't many of these around, and they aren't what is generally intended when talking about a cryptographic hash. It would be convenient if hashes were permutations, but they generally aren't.
    
    thisrod 6 years ago
    
    Thanks
dlivingston 6 years ago

Forgive my ignorance, but what applications would this be used in where just maintaining copies of the initial parameters to f() wouldn’t work?
- sriku 6 years ago
  
  Very interesting. Not read the paper yet, but plan to.
  Reversibility is important in quantum computing. Quantum circuits must transform input to output in a "unitary" manner, which is reversible. If you consider the input to output as a linear transformation matrix (with complex values), then the complex conjugate of the matrix gives the "inverse function".
  This is probably useless info, but reversibility in the classical sense is also interesting due to the energy bounds of computation. The Landauer limit (kTln2) [1] gives a lower bound of energy that must be dissipated to destroy one bit of information in a computation. A reversible calculation does not destroy bits.
  [1] https://en.wikipedia.org/wiki/Landauer%27s_principle
  - cbkeller 6 years ago
    
    I was wondering about this latter point as well, and apparently this is exactly the goal if one could implement reversible computing at the hardware level [1]:
    > One of the earliest proposals to reduce heat from lost bits was to make computer circuits reversible. A reversible circuit has exactly as many outputs as inputs, assigning one output pattern to every input pattern and vice versa. That means each input can be reconstructed from the output; no bits are lost, so reversible circuits will not give off heat from bit loss. Furthermore, reversible circuits can simulate standard logic functions and can therefore be used in any computer.
    [1] https://www.americanscientist.org/article/computers-that-can...
- sinenomine 6 years ago
  
  One can use this to express a large part of a deep ML model as a reversible function, thus drastically lowering memory requirements for backpropagation. The same technique, but applied by hand was recently used to speed up training for transformer language model: https://arxiv.org/abs/2001.04451
  I think in the future we will see a trend of expressing most of DL model as reversible computation, with minimal irreversible module in the end and in the beginning.
- amitport 6 years ago
  
  I think the space requirements are different.
  Even if you only care about previous applications of f. Keeping snapshots of every input isn't always feasible.
  - GiggleLiu 6 years ago
    
    Snapshots can be freed through reverse computing in NiLang. Reverse computing has subtle relationship with checkpointing, see author's blog: https://nextjournal.com/giggle/reverse-checkpointing .
  - e12e 6 years ago
    
    I think the core idea is that it actually gives you an inverse function, based on your regular (in the DSL) function definition.
    There's as far as I can tell no inherent need to call the regular function first - given any function f, you get a callable inverse function ~f.
GiggleLiu 6 years ago

Thanks for the link to the paper, also check the latest version here: https://github.com/GiggleLiu/nilangpaper/blob/master/invc.pd...

xiphias2 6 years ago

,,The performance of reversible programming automatic differentiation is much better than most traditional frameworks.''

It would be good to see a ResNet training benchmark comparison with PyTorch as an example if this is really true.

GiggleLiu 6 years ago

This is the latest NiLang tutorial notebook: https://github.com/JuliaReverse/NiLangTutorial

choeger 6 years ago

Wait a second, does it yield an exact inverse or a numerical approximation?

e12e 6 years ago

Given the example of reversing over a function that calculates the first Fibonacci number greater than 100, it seems to be a bit more going on than "just" a numerical approximation:
https://github.com/GiggleLiu/NiLang.jl/blob/master/examples/...
Ed: after skimming the paper (that went mostly over my head) - this does indeed seem to be about "actually" running functions in reverse - given a function only defined "forward" in the nilang DSL. It appears the graph embed examples are missing in the master branch, unfortunately.
I wonder if this can be used more trivially to solve simple problems too - like calculating values/sums pertaining to compound interest/investment, given a naive function for calculating sums etc (its trivial to add up compounded interest and deposits, but a tiny bit more complicated to answer the question "at what time is my portfolio at X or more dollars).
- GiggleLiu 6 years ago
  
  Graph embedding example is not missing, it in the appendix of the paper's master branch: https://github.com/GiggleLiu/nilangpaper/blob/master/invc.pd...
  The source code is available here: https://github.com/JuliaReverse/NiGraphEmbedding.jl
GiggleLiu 6 years ago

Very good question. Floating point `+=` and `-=` are not exactly reversible. This is the only approximation that we have made in NiLang. To compile NiLang to a reversible device (rigorously reversible), we need to overhaul current number systems, i.e. using fixed point numbers and logarithmic numbers instead. Fixed point numbers are exactly reversible under `+=` and `-=`, logarithmic numbers are exactly reversible under `*=` and `/=`. Here is an example of implementing Bessel function with two number systems: https://giggleliu.github.io/NiLang.jl/dev/examples/besselj/

vsskanth 6 years ago

How does this compare to zygote ?

GiggleLiu 6 years ago

1. Scalar level
Various benchmarks (including those in the paper) show NiLang is much better than Zygote to differentiate scalar functions. And Zygote is much faster than TF and PyTorch.
2. Tensor level
Zygote, TF and PyTorch are much better than NiLang, because NiLang's matrix multiplication is not fully optimized, it is much slower than BLAS. (One can wrap BLAS into NiLang, but that does not measure NiLang's programming language level AD performance anymore)
cbkeller 6 years ago

As far as I understand it (which is not very far):
Zygote calculates derivatives using source-to-source automatic differentiation.
This calculates function inverses (so to stretch the analogy a bit, it's kinda like "source-to-source automatic inversion")
- ChrisRackauckas 6 years ago
  
  Not necessarily: you can use it to build fast inverse of scalar functions. Here's an example on a vjp of an ODE f:
  https://gist.github.com/GiggleLiu/0c4608f70bda050f59992f5fc0...
  - cbkeller 6 years ago
    
    I'm probably just missing something, but did you mean to say "fast derivatives of scalar functions" by any chance? I.e., that you can use the free inverses to get at the derivatives in some way?
    
    ChrisRackauckas 6 years ago
    
    oh yes, I meant derivatives.

Settings

Nilang.jl – A Reversible Julia DSL

Keyboard Shortcuts