PyTorch – Tensors and Dynamic neural networks in Python

447 points by programnature 9 years ago · 90 comments

Reader

Smerity 9 years ago

Only a few months ago people saying that the deep learning library ecosystem was starting to stabilize. I never saw that as the case. The latest frontier for deep learning libraries is ensuring efficient support for dynamic computation graphs.

Dynamic computation graphs arise whenever the amount of work that needs to be done is variable. This may be when we're processing text, one example being a few words while another being paragraphs of text, or when we are performing operations against a tree structure of variable size. This problem is particularly prominent in particular subfields, such as natural language processing, where I spend most of my time.

PyTorch tackles this very well, as do Chainer[1] and DyNet[2]. Indeed, PyTorch construction was directly informed from Chainer[3], though re-architected and designed to be even faster still. I have seen all of these receive renewed interest in recent months, particularly amongst many researchers performing cutting edge research in the domain. When you're working with new architectures, you want the most flexibility possible, and these frameworks allow for that.

As a counterpoint, TensorFlow does not handle these dynamic graph cases well at all. There are some primitive dynamic constructs but they're not flexible and usually quite limiting. In the near future there are plans to allow TensorFlow to become more dynamic, but adding it in after the fact is going to be a challenge, especially to do efficiently.

Disclosure: My team at Salesforce Research use Chainer extensively and my colleague James Bradbury was a contributor to PyTorch whilst it was in stealth mode. We're planning to transition from Chainer to PyTorch for future work.

[1]: http://chainer.org/

[2]: https://github.com/clab/dynet

[3]: https://twitter.com/jekbradbury/status/821786330459836416

PieSquared 9 years ago

Could you elaborate on what you find lacking in TensorFlow? I regularly use TensorFlow for exactly these sorts of dynamic graphs, and it seems to work fairly well; I haven't used Chainer or DyNet extensively, so I'm curious to see what I'm missing!
- Smerity 9 years ago
  
  When you say "exactly these sorts of dynamic graphs", what do you mean? TensorFlow has support for dynamic length RNN unrolling but that really doesn't extend well to any dynamic graph structure such as recursive tree structure creation. Since the computation graph has a different shape and size for every input they are difficult to batch and any pre-defined static graph is likely excessive, wasting computation, or inexpressive.
  The primary issue is that the computation graph is not imperative - you define it explicitly. Chainer describes this as the difference between "Define-and-Run" frameworks and "Define-by-Run" frameworks[1].
  TensorFlow is "Define-and-Run". For loops and conditionals end up needing to be defined and injected into the graph structure before it's run. This means there are "tf.while_loop" operations for example - you can't use a "while" loop as it exists in Python or C++. This makes debugging difficult as the process of defining the computation graph is separate to the usage of it and also restricts the flexibility of the model.
  In comparison, both Chainer, PyTorch, and DyNet are "Define-by-Run", meaning the graph structure is defined on-the-fly via the actual forward computation. This is a far more natural style of programming. If you perform a for loop in Python, you're actually performing a for loop in the graph structure as well.
  This has been a large enough issue that, very recently, a team at Google created "TensorFlow Fold"[2], still unreleased and unpublished, that handles dynamic computation graphs. In it they tackle specifically dynamic batching within the tree structured LSTM architecture.
  If you compare the best example of recursive neural networks in TensorFlow[3] (quite complex and finicky in the details) to the example that comes with Chainer[4], which is perfectly Pythonic and standard code, it's pretty clear why one might prefer "Define-by-Run" ;)
  [1]: http://docs.chainer.org/en/stable/tutorial/basic.html
  [2]: https://openreview.net/pdf?id=ryrGawqex
  [3]: https://github.com/bogatyy/cs224d/tree/master/assignment3
  [4]: https://github.com/pfnet/chainer/blob/master/examples/sentim...
  - PieSquared 9 years ago
    
    Ah, fair enough, I see your point. An imperative approach (versus TensorFlow's semi-declarative approach) can be easier to specialize to dynamic compute graphs.
    I personally think the approach used in TensorFlow is preferable – having a static graph enables a lot of convenient operations, such as storing a fixed graph data structure, shipping models that are independent of code, performing graph transformations. But you're right that it entails a bit more complexity, and that implementing something like recursive neural networks, while totally possible in a neat way, ends up taking a bit more effort. I think that the trade-off is worth it in the long run, and that the design of TensorFlow is very much influenced by the long-run view (at the expense of immediate simplicity...).
    The ops underlying TensorFlow's `tf.while_loop` are actually quite flexible, so I imagine you can create a lot of different looping constructs with them, including ones that easily handle recursive neural networks.
    Thanks for pointing out a problem that I haven't really thought about before!
  - kalamaya 9 years ago
    
    I'm intrigued by pyTorch but I am really having a hard time groking what you mean by the whole "but that really doesn't extend well to any dynamic graph structure such as recursive tree structure creation. Since the computation graph has a different shape and size for every input they are difficult to batch and any pre-defined static graph is likely excessive, wasting computation, or inexpressive."
    Would you mind providing a concrete example to relate to if you dont mind? Again, intrigued by PT so want to learn more about it vs TF...
liuliu 9 years ago

You can build both the symbolic computation graph and do the computation at the time when defining the network architecture, thus, gaining the ability to be "dynamic" and also supporting advanced features with the symbolic representation that you built on the side.
In fact, with DyNet or PyTorth, you still need to bookkeeping the graph you traversed (tape) because no one is doing forward AD. If that's the case, why not have a good library to do symbolic computation graph and build dynamic feature on top of it. (I am not saying Tensorflow is a good symbolic computation graph library to build upon just arguing that start with a define-compile-run library doesn't necessarily hinder your ability to support dynamic graphs).
- smhx 9 years ago
  
  the biggest hindrance to do this are language constructs that cannot be or are inconveniently expressed in the symbolic graph, such as python's if vs tf.if and for vs theano.scan, or conditioning on some python-code (not tensor operations). So to build an eagerly evaluating symbolic graph framework that is allowed to do arbitrary things would mean that you would (to an extent) reimplement the language you are working with.
  - liuliu 9 years ago
    
    Let's assume Tensorflow has basic symbolic computation graph expressiveness. What you would do is to build a symbolic representation while executing your graph inline, your symbolic representation doesn't need to have any control structure, it is simpler than that. You execute while loop in Python as usual, and your symbolic representation won't have TF.While at all, it will simply be the execution you performed so far (matrix mul 5 times).
    Once you have a reasonable symbolic computation graph library, you don't need to explicitly build a "tape" because the symbolic representation will record the order of execution and reverse AD even graph optimization (applying CSE etc) come naturally as well.
superfx 9 years ago

How is adding dynamic graphs to TensorFlow "after the fact" while adding it to Torch isn't? (Torch is much older than TF).
- Smerity 9 years ago
  
  Torch was never written as a static graph computation framework. Torch was/is more a tensor manipulation library where you are executing the individual operations step by step and the graph can be tracked and constructed incrementally from those operations. For this reason, much of PyTorch is about building a layer on top of the underlying components (which are focused on efficiently manipulating tensors and/or implementing low level NN components) rather than re-architecting Torch itself.
  This won't be the same for TensorFlow as it was written with the concept of a static computation graph at its core. I'm certainly not saying it's impossible to re-architect - and many smart people in the community and at Google are devoting thinking and code to it - but simply that the process will be far more painful as it was not written with this as an intended purpose.
  To note - there are many advantages to static computation graphs. Of particular interest to Google is that they distribute their computations very effectively over large amounts of hardware. Being able to do this with a dynamic computation graph would be far more problematic.
  - superfx 9 years ago
    
    Thanks for the clarification.
    Does the upcoming XLA interact with this as well? I.e. compilation would be too costly for dynamic graphs, and so it would only make sense for static graphs?
    
    Smerity 9 years ago
    
    I am not highly clued in to XLA as it's new, quite experimental, and most honestly I've just not looked at it in detail. Given XLA provides compilation, JIT or ahead of time, it doesn't really (yet) factor in to the dynamic graph discussion.
    What would theoretically be interesting is a JIT for dynamic computation graphs. Frequent subgraphs could be optimized and cached and re-used when appropriate, similar to a JIT for Javascript. No doubt they're already pondering such things.
    https://www.tensorflow.org/versions/master/experimental/xla/
datascientist 9 years ago

Chainer's Define-by-run apporach is also described here https://www.oreilly.com/learning/complex-neural-networks-mad...
jkk 9 years ago

Any particular reason you prefer PyTorch over DyNet?
chewxy 9 years ago

If you guys wanna use Go, Gorgonia also features dynamic graphs the way Chainer does (also Theano-style compile-execute machines)
attractivechaos 9 years ago

One question: how do you save a dynamic network if it changes from time to time (e.g. from sample to sample)?
- apaszke 9 years ago
  
  You save the parameters and the code of the model definition

smhx 9 years ago

It's a community-driven project, a Python take of Torch http://torch.ch/. Several folks involved in development and use so far (a non-exhaustive list):

* Facebook * Twitter * NVIDIA * SalesForce * ParisTech * CMU * Digital Reasoning * INRIA * ENS

The maintainers work at Facebook AI Research

tsomctl 9 years ago

Not only that, but it appears to use the same core c libray (TH) as Lua torch.
- smhx 9 years ago
  
  we actually share the same git-subtree between Lua and Python variants. TH, THNN, THC, THCUNN are shared.
  - divbit 9 years ago
    
    I have been running in the back of my mind the idea of attempting a julialang interface to torch for a few weeks now, using the ccall interface: http://docs.julialang.org/en/release-0.5/manual/calling-c-an.... Do you have any thoughts / recommendations w'r't' that? (This would be more of a fun / weekend(s) project for me than anything else) My goal would be to have the tensors override the .* and * operators as used here: https://gist.github.com/divbit/ec57ad2f1989bf13aecdf9e1e1056...

spyspy 9 years ago

This project aside, I'm in love with that setup UI on the homepage telling you exactly how to get started given your current setup.

artursapek 9 years ago

Agreed. Reminds me of this scary page I found the other day when googling "certbot setup":
https://certbot.eff.org/all-instructions/
jmportilla 9 years ago

yeah, that's a great way to quickly show the set up and OS requirements
demonshalo 9 years ago

yes indeed!

programnatureOP 9 years ago

Actually not clear if there is an official affiliation with Facebook, other than some of the primary devs.

throwawayish 9 years ago

Copyright (c) 2016- Facebook, Inc (Adam Paszke)
Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
Copyright (c) 2011-2013 NYU (Clement Farabet)
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
Notably absent is the otherwise Facebook-typical PATENTS license thing. Which I see as a good sign.
Also, it doesn't look like this has happened just now? PRs in the repo go back a couple months and the repo has 100+ contributors.
- smhx 9 years ago
  
  it's the same license file as https://github.com/torch/torch7 and http://torch.ch
  The C libraries are shared among the Lua and Python variants

tdees40 9 years ago

At this point I've used PyTorch, Tensorflow and Theano. Which one do people prefer? I haven't done a ton of benchmarking, but I'm not seeing huge differences in speed (mostly executing on the GPU).

sandGorgon 9 years ago

Keras is going to be the interface to Tensorflow - https://news.ycombinator.com/item?id=13413487
- tdees40 9 years ago
  
  Yes, but Keras works just fine using Theano as a backend as well...

taterbase 9 years ago

Is there any reason this might not work in windows? I see no installation docs for it.

smhx 9 years ago

the C libraries are compatible with Windows, they are used in Torch windows ports. We just dont have any Windows devs on the project to help and maintain it :( .
- randomx89 9 years ago
  
  Are you guys looking for Windows devs to contribute or help maintaining it? I'd be interested in helping out if I can. I currently use Chainer, but I'd like to try pytorch
  - apaszke 9 years ago
    
    Yes! There's an issue on that, where we'll be coordinating the work: https://github.com/pytorch/pytorch/issues/494

EternalData 9 years ago

Been using PyTorch for a few things. Love how it integrates with Numpy.

theoracle101 9 years ago

Most important question. Is this still 1 indexed (Lua was 1 indexed, which means porting code you need to be aware of this)?

apaszke 9 years ago

No! Python 0 based indexing everywhere.
smhx 9 years ago

it's 0-indexed just like everything else in python

rtcoms 9 years ago

I've never fiddled with machine learning thing so don't know anything about it.

I am wondering if CUDA is mandatory for torch installation ? I use a Macbook air which doesn't have graphics card, so not sure if torch can be installed and used on my machine.

itg 9 years ago

It's not mandatary, but for some problems such as using image data, it provides as substantial speedup when training a classifier.
zitterbewegung 9 years ago

You could probably train MNIST on your Macbook Air but anything much more complicated for that you would want to use A GPU.
RangerScience 9 years ago

I believe that's a "no". I was able to set up a docker'ized Deep Style on my Macbook Pro, although it takes for bloody ever to do a single image. CUDA is, AFAIK, a substantial speed boost, but not a requirement.

baq 9 years ago

Very nice to see Python 3.5 there.

jbsimpson 9 years ago

This is really interesting, I've been wanting to learn more about Torch for a while but have been reluctant to commit to learning Lua.

veli_joza 9 years ago

Lua is a pleasure to learn and use. The language core is so simple and elegant, you can learn it in a day. Standard library is also very light, which is both strength and weakness.
I use it more and more for hobby projects. Combine it with LuaJIT (which torch uses) and you have the fastest interpreted language around. Give it a try.
- etiene 9 years ago
  
  I want to reiterate this. I started learning it for guilt because it was created in the university I studied. Then I realised it was really a pleasure to use it. I still use it in many hobby projects nowadays whenever I can.

ankitml 9 years ago

I am confused with the license file. What does it mean? Some rights reserved and copyright... Doesnt look like a real open source project.

yincrash 9 years ago

It is a standard 3-clause BSD license. The "All rights reserved" portion definitely adds ambiguity (and only exists in the BSD license out of all major OSS licenses). There is StackExchange answer that goes into the history of it[1].
[1] http://opensource.stackexchange.com/questions/2121/mit-licen...
- ankitml 9 years ago
  
  Got it. It makes sense now.
wyldfire 9 years ago

Copyrights are appropriate, they make it explicit who produced the work and make it easier to enforce the license. This license follows the general form of the BSD-style license [1].
[1] https://opensource.org/licenses/BSD-3-Clause
s_ngularity 9 years ago

Looks like standard BSD-3 to me

gallerdude 9 years ago

What's the highest level neural network lib I can use? I'm a total programming idiot but I find neural nets fascinating.

visarga 9 years ago

Keras requires just a few lines of code, it's designed for easy use and practicality.
- apaszke 9 years ago
  
  torch.nn offers a very similar interface to Keras (e.g. see Alexnet definition at https://github.com/pytorch/vision/blob/master/torchvision/mo...).
nickdavidhaynes 9 years ago

http://playground.tensorflow.org/
Pretty much no way to use neural networks (except for playing, like above) without writing code.
eudoxus 9 years ago

This[1] was posted earlier today to HN. Seems pretty simple to play with NNs without coding.
[1] - http://kur.deepgram.com/

aaron-lebo 9 years ago

Is this related to lua's Torch at all?

http://torch.ch/

zo7 9 years ago

They don't seem to explicitly say it, but it might be using the same core code given the structure of the framework and their mentioning that it's a mature codebase several years old. The license file also goes back to NYU before being taken over by Facebook, similar to Torch.
- apaszke 9 years ago
  
  The core libraries are the same as in Lua torch, but the interface is redesigned and new.
- pavanky 9 years ago
  
  They are sharing the code base using git-subtrees. So the C and CUDA parts of the codebase will be kept in sync. The modules written in lua or python will diverge.
superdaniel 9 years ago

I was wondering the same thing. There's even another repo that seems mildly popular called pytorch on Github: https://github.com/hughperkins/pytorch
pavanky 9 years ago

They share the same underlying C and CUDA libraries. The Python and Lua modules are different. You can see both projects have pretty much the same contributors because they are sharing the code base using git-subtree.

0mp 9 years ago

It is worth adding that there is a wip branch focused on making PyTorch tensors distributable across machines in a master-workers model: https://github.com/apaszke/pytorch-dist/

shmatt 9 years ago

i've been running their dcgan.torch code in the past few days and results have been pretty amazing for plug and play

vegabook 9 years ago

Guess there's no escaping Python. I had hoped Lua(jit) might emerge as a scientific programming alternative but with Torch now throwing its hat into the Python ring I sense a monoculture in the making. Bit of a shame really because Lua is a nice language and was an interesting alternative.

jjawssd 9 years ago

Lua is extremely flexible to the point where there is basically no standard library. This causes problems with code reuse and moving between codebases because everyone does things drastically differently. Compare this to Numpy in the Python world, a single fundamental package for scientific computing in Python.
Lua is less used than Python in the scientific community, and a lot of the most innovative machine learning researchers already work with C++ and Python. Using yet another language with only marginal benefit increases cognitive load and drains from the researcher's mental innovation budget, forcing the researcher to learn the ins and outs of Lua rather than working on innovative machine learning solutions.
Lua is a nice language. Python 3 is a nice language and there are many new exciting features and development styles (hello async programming?) in the making which will prevent a monoculture from forming in the near term.
- vegabook 9 years ago
  
  Thanks for the interesting and informative comment. Do I sense just a tiny bit of regret though? Yet another Python interface. YAPI. You heard it here first. And no, Py3 is not that nice. Too much cruft by far. And lua is miles faster than Python when you're outside the tensor domain, ie while you're sourcing and wrangling your data. Arguably luajit obviates the need for C , something you can't say about Python. Disclosure: I am a massive, but increasingly disenchanted, user of Python. I had actually started looking at Torch7, foregoing tensorflow, precisely because of Lua. But the walls are closing in....
  - jjawssd 9 years ago
    
    A very large portion of performance problems can be mitigated with the use of cython and the new asyncio stuff.
    asyncio success story: https://magic.io/blog/asyncpg-1m-rows-from-postgres-to-pytho...
    cython: http://scikit-learn.org/stable/developers/performance.html
    
    vegabook 9 years ago
    
    Luajit is at least 10x faster than python and easily obviates the need to mess around with cython. That's an easy win for Lua. Let's be honest: Torch has decided that if you cannot beat them, join them. It is about network effects. Not about Python better than Lua intrinsically.
    
    baq 9 years ago
    
    but why do you care if luajit is faster than python if everything that matters is computed on the GPU anyways?
    
    jjawssd 9 years ago
    
    Can't argue with that
    
    inlineint 9 years ago
    
    An alternative to Cython is Numba [1], which speeds up some cycles in pure Python by just adding a single decorator.
    [1] http://numba.pydata.org/
  - cr0sh 9 years ago
    
    > And lua is miles faster than Python when you're outside the tensor domain, ie while you're sourcing and wrangling your data.
    Then use Lua for that, if you are more comfortable there and want/need the speed bump. There's nothing that says an entire project or whatnot has to be developed in a singular language.
    Use each tool to its strengths, as your needs, requirements, and abilities dictate.
    
    vegabook 9 years ago
    
    There always has to be someone rolling out the horses-for-courses pitch. No. I wanted Lua to gain traction with other people. That's the point. I would have liked the Lua sci-ecosystem to be healthy as an alternative.
    
    infinite8s 9 years ago
    
    Is there an equivalent to numpy in the Lua space?
    
    apaszke 9 years ago
    
    torch?
  - webmaven 9 years ago
    
    > And lua is miles faster than Python when you're outside the tensor domain, ie while you're sourcing and wrangling your data.
    Is that true even if the Python used is PyPy rather than CPython?
- etiene 9 years ago
  
  I like Lua more than I like Python and all of this makes me sad. I wished more people were putting their hearts into getting the Lua's ecosystem going instead of into things like this.
argonaut 9 years ago

You're making the language out to be more important than it really is.
The Python that you write when using these frameworks just the glue code / scripts. All you're doing is calling the framework's functions. Most of it gets thrown away (as researchers). The stuff that doesn't is self-contained and usually short. You're not writing 100k+ line codebases.
Lua may be faster for certain tasks (data processing), but the time it takes for does tasks is usually a rounding error in deep learning. Not to mention you can still code in C/C++ with pytorch.
If there is a monoculture in machine learning, it would be the deep learning monoculture.
statsmatscats 9 years ago

Here is an emerging julia alternative http://www.breloff.com/transformations/
eva1984 9 years ago

No, there is already no escaping from CUDA, so it is already a monoculture nevertheless.
crudbug 9 years ago

I was also on this bandwagon. Its about pythonic syntax not semantics that drives this.
If only Mike Pall created a transpiler infrastructure layer on top of LuaJIT.
wodenokoto 9 years ago

There's also R and Julia and there are still plenty of people building neural networks in C.
- attractivechaos 9 years ago
  
  It is easy to build a multi-layer perceptron purely in C. You can roll your own or use a library like FANN. However, so far as I know, very few (darknet is the only example I know of) are using C to build a bit more complex networks like CNN/RNN, let alone those topologically complex networks in research domain.

plg 9 years ago

Every time I decide I'm going to get into Python frameworks again, and I start looking at code, and I see people making everything object-oriented, I bail

Just a personal (anti-)preference I guess

apaszke 9 years ago

But it is possible to write your model in purely functional style. Check out the PR to examples repo with functional ResNets https://github.com/pytorch/examples/pull/22.
closed 9 years ago

Same. I know there can be nice, composable OO approaches, but every time I bump into a super crazy stacktrace, or need one of those police-detective-style boards with yarn to connect everything, I start to wonder.

Settings

PyTorch – Tensors and Dynamic neural networks in Python

Keyboard Shortcuts