A native Python IDE built for data science

201 points by coris47 10 years ago · 46 comments

Reader

I find the dramatic rise of Python (and open source tools in general) for scientific work interesting and cool. When I first started using Python many years ago, I was doing contract work for the SciPy/NumPy folks (Enthought), and Python was still a blip in the scientific world...Java and Fortran and a bit of C++ ruled the commercial world, with Mathematica and MatLab handling the academic side of things (with some overlap and some outliers).

It's really cool to see. I like seeing science democratized, and Python is definitely a democratizing influence, and the fact that so much of it is open source is really fantastic. I've also noticed that a lot more domain experts are becoming programmer+domain experts through this evolution. It used to be that there were teams with a scientist to design it and one or more programmers to implement it, and that's becoming less of a requirement, which can accelerate the science-ing to a notable degree.

minimaxir 10 years ago

The UI is obviously inspired by Rstudio for R. And I have zero objections to that; this is something that I've wanted for awhile, after having difficulty with PyCharm for my Python-related data projects. I'll play around with it a bit.

As a heads up, the setup workflow assumes you are on OS X, which may be a problem if it asks you to open a Terminal on Windows: http://i.imgur.com/nya50e4.png

ekianjo 10 years ago

I realized the download on Linux is massive, though (600+ Mb) - why is that? R and RStudio combined weigh way less than that.
Plus, for distributing binaries in Linux, instead of a zip file (tar.gz would be more common, too) it's better to support the main distros with a repository (PPA for Ubuntu, pacman for Arch, etc...) since it's way more user friendly every single time you want them to stay up to date.
glamp 10 years ago

hey minimaxir, the commands should still work if you have python and/or conda installed. if you have any issues you can post here: https://github.com/yhat/rodeo/issues.
thanks for trying it out!
- minimaxir 10 years ago
  
  That works for the pip command. Since people who analyze data may not necessarily be experts at the command line, I recommend relooking at this workflow.
  matplotlib, however, fails to install completely with this method on Windows for subtle reasons. Filed: https://github.com/yhat/rodeo/issues/204
  The documentation just points to a blog article on how to install matplotlib on Windows.
IndianAstronaut 10 years ago

Funny enough, I traded in Rstudio for Jupyter notebooks for R, especially for demos to other people since it is much easier to see tables, graphs and such.

ced 10 years ago

In the last year, my workflow for data science/AI has completely shifted to Jupyter notebooks. Is there any IDE that offers a similar experience?

jasongrout 10 years ago

Jupyter dev here. FYI, we're currently working on building a new Jupyter web interface that resembles a more classic IDE experience, which we are calling JupyterLab. A first version is progressively coming together, and is planned to have code editor and terminal components. We also plan to have a notebook component, like the current notebook, in a later version. Our in-progress work is spread across many repos currently (see the various jupyter/jupyter-js-* repos on github).
- ced 10 years ago
  
  Sounds promising, thank you for working on this!
skierscott 10 years ago

Yup, same experience (also for data science). The biggest helper I've had so far is jupyter-vim-bindings[1]
[1]:https://github.com/lambdalisue/jupyter-vim-binding
plusepsilon 10 years ago

There is Beaker notebooks which is similar to Jupyter. Haven't tried it but you can integrate multiple languages in one notebook.
http://beakernotebook.com/
- jupiter90000 10 years ago
  
  I really like the idea behind beaker, but last time I played with it, the main issue/concern occurs for me when using a somewhat large (uses most of machine RAM) dataset, since using it in another language creates an additional copy of the data in memory for the other language to use. This multiplies the memory used by the number of languages that need an instance of the dataset. If there could be shared memory for datasets somehow, it would be much more useful (if they've figured that out since I last used it, please tell me).
- chillacy 10 years ago
  
  You can do the same in ipython notebook (I'd presume jupyter as well) using magic commands: http://rpy.sourceforge.net/rpy2/doc-2.4/html/interactive.htm...
  It's kind of weird to use but it works for the most part. You can clean up some data in python, then push the data over to a cell written in R to do some other evaluation, then push the results back over to python.

simoneau 10 years ago

By "native" they mean Electron-based deployment of HTML/JavaScript. More info:

http://blog.yhathq.com/posts/how-rodeo-works.html

theelfismike 10 years ago

Settings

A native Python IDE built for data science

Keyboard Shortcuts