Settings

Theme

A native Python IDE built for data science

yhat.com

201 points by coris47 10 years ago · 46 comments

Reader

SwellJoe 10 years ago

I find the dramatic rise of Python (and open source tools in general) for scientific work interesting and cool. When I first started using Python many years ago, I was doing contract work for the SciPy/NumPy folks (Enthought), and Python was still a blip in the scientific world...Java and Fortran and a bit of C++ ruled the commercial world, with Mathematica and MatLab handling the academic side of things (with some overlap and some outliers).

It's really cool to see. I like seeing science democratized, and Python is definitely a democratizing influence, and the fact that so much of it is open source is really fantastic. I've also noticed that a lot more domain experts are becoming programmer+domain experts through this evolution. It used to be that there were teams with a scientist to design it and one or more programmers to implement it, and that's becoming less of a requirement, which can accelerate the science-ing to a notable degree.

minimaxir 10 years ago

The UI is obviously inspired by Rstudio for R. And I have zero objections to that; this is something that I've wanted for awhile, after having difficulty with PyCharm for my Python-related data projects. I'll play around with it a bit.

As a heads up, the setup workflow assumes you are on OS X, which may be a problem if it asks you to open a Terminal on Windows: http://i.imgur.com/nya50e4.png

  • ekianjo 10 years ago

    I realized the download on Linux is massive, though (600+ Mb) - why is that? R and RStudio combined weigh way less than that.

    Plus, for distributing binaries in Linux, instead of a zip file (tar.gz would be more common, too) it's better to support the main distros with a repository (PPA for Ubuntu, pacman for Arch, etc...) since it's way more user friendly every single time you want them to stay up to date.

  • glamp 10 years ago

    hey minimaxir, the commands should still work if you have python and/or conda installed. if you have any issues you can post here: https://github.com/yhat/rodeo/issues.

    thanks for trying it out!

    • minimaxir 10 years ago

      That works for the pip command. Since people who analyze data may not necessarily be experts at the command line, I recommend relooking at this workflow.

      matplotlib, however, fails to install completely with this method on Windows for subtle reasons. Filed: https://github.com/yhat/rodeo/issues/204

      The documentation just points to a blog article on how to install matplotlib on Windows.

  • IndianAstronaut 10 years ago

    Funny enough, I traded in Rstudio for Jupyter notebooks for R, especially for demos to other people since it is much easier to see tables, graphs and such.

ced 10 years ago

In the last year, my workflow for data science/AI has completely shifted to Jupyter notebooks. Is there any IDE that offers a similar experience?

  • jasongrout 10 years ago

    Jupyter dev here. FYI, we're currently working on building a new Jupyter web interface that resembles a more classic IDE experience, which we are calling JupyterLab. A first version is progressively coming together, and is planned to have code editor and terminal components. We also plan to have a notebook component, like the current notebook, in a later version. Our in-progress work is spread across many repos currently (see the various jupyter/jupyter-js-* repos on github).

  • skierscott 10 years ago

    Yup, same experience (also for data science). The biggest helper I've had so far is jupyter-vim-bindings[1]

    [1]:https://github.com/lambdalisue/jupyter-vim-binding

  • plusepsilon 10 years ago

    There is Beaker notebooks which is similar to Jupyter. Haven't tried it but you can integrate multiple languages in one notebook.

    http://beakernotebook.com/

    • jupiter90000 10 years ago

      I really like the idea behind beaker, but last time I played with it, the main issue/concern occurs for me when using a somewhat large (uses most of machine RAM) dataset, since using it in another language creates an additional copy of the data in memory for the other language to use. This multiplies the memory used by the number of languages that need an instance of the dataset. If there could be shared memory for datasets somehow, it would be much more useful (if they've figured that out since I last used it, please tell me).

    • chillacy 10 years ago

      You can do the same in ipython notebook (I'd presume jupyter as well) using magic commands: http://rpy.sourceforge.net/rpy2/doc-2.4/html/interactive.htm...

      It's kind of weird to use but it works for the most part. You can clean up some data in python, then push the data over to a cell written in R to do some other evaluation, then push the results back over to python.

simoneau 10 years ago

By "native" they mean Electron-based deployment of HTML/JavaScript. More info:

http://blog.yhathq.com/posts/how-rodeo-works.html

theelfismike 10 years ago

See also: Pycharm

https://www.jetbrains.com/pycharm/

jgamman 10 years ago

honest question: what if your science isn't maths/physics/data? I'm a chemist and from what i can see there's @#$@# all out there in FOSS land.

  • analog31 10 years ago

    Excellent question. Here's my chemistry cred: I'm married to a chemist, related to a couple more, and have worked in an area related to analytical chemistry, though I got my degree in physics, 2+ decades ago.

    So here are some generalizations.

    While in school, I noticed that the physics students were far more interested than the chemistry students, in math and computer stuff. Maybe we were computer science wannabees, or maybe we guessed (correctly in my case) that proficiency with computers would make us more employable. This was true in both undergrad and grad school.

    And there's a long tradition of physicists stealing ideas from math and computation for solving physics problems. When I was in school, computation was considered to be a specialized branch of chemistry, but was at the forefront of physics.

    Another difference is that the physics students were generally more interested in making our own tools. The current "maker" and "hacker" trends are old hat for small-lab experimental physicists.

    Chemistry has always been a bigger field than physics, which I suspect has attracted more interest in making commercial equipment and software. I've noticed in an industrial setting, that managers are often looking for closed solutions that can't be modified by the user, either for regulatory reasons or adversarial labor-management attitudes. The industry wants your boss to think that letting you make your own tools is either dangerous, or a waste of your time.

    In contrast, even in industry, physicists still have to make our own tools. And management already knows that we're freaks. ;-)

    So the absence of FOSS tools for chemistry doesn't shock me.

    • isolate 10 years ago

      > I've noticed in an industrial setting, that managers are often looking for closed solutions that can't be modified by the user, either for regulatory reasons or adversarial labor-management attitudes. The industry wants your boss to think that letting you make your own tools is either dangerous, or a waste of your time.

      It's interesting to consider in this context that workers owning the means of production is what links the GPL with Marxism.

      • analog31 10 years ago

        It took me way too many years to discover FOSS, but after getting hooked on it, I find that I'm actually more motivated and creative when I'm using tools that nobody owns.

  • entee 10 years ago

    There are some tools out there, for example Open Babel:

    http://openbabel.org/wiki/Main_Page

    which has some python bindings built in. I set some of this up for myself during my PhD but it was occasionally kind of a pain sometimes to get it to work. Also at the time I was a bit of a noob so there's that :).

    It has some nice features for handling chemical structures, I used it mostly for translating one format into another and computing fingerprints, but I think more can be done.

    In general I'd agree with @analog31, biology has some good OSS tools, physics has some good OSS tools, but you get to the bridging discipline of chemistry and you find very few. My theory re. organic chemistry and biochemistry applications: it's way more profitable to be closed source. In contrast to the other two fields (gross generalization I know, but somewhat true) there's a very large market for commercial software in Pharma. If someone is willing to pay top dollar, especially an industry that is paranoid about IP and therefore tends to (rightly or wrongly) prefer closed, proprietary solutions, then that's where software will end up.

  • TheLogothete 10 years ago
    • jhbadger 10 years ago

      I doubt anyone doing data analysis is unfamiliar with R, as it is the current standard. The push to use Python, or Julia, or a version of Lisp (hey, we had that in the 1990s -- us olds remember xlispstat) for data analysis is coming from the people who find R to be a rather unpleasant language. Which is a subjective opinion, obviously, but not an uncommon one.

      • baldfat 10 years ago

        Learning Functional Programming (I learned Racket) makes R great since it really is a functional language.

michaelperalta 10 years ago

I'm curious what advantages are there with this or (PyCharm) over something like Spyder?

  • plusepsilon 10 years ago

    PyCharm is unparalleled in its understanding of code and it's great for building codebases. It is a programmer's tool first and foremost. I find PyCharm's interactive features clunky and have to do extra work to see the data.

    RStudio / Rodeo provides an interactive data analysis environment where multiple "views" are presented right in front of the user. A view could be a plot, a data frame or interactions between the code editor and the terminal. As a data analysis person it really helps to put the mental strain of code far away as possible and just explore the data.

    Jupyter Notebook are nice but it can get overwhelming (too much scrolling) when things get complicated. Great teaching tool, however.

    I think each of these tools have different use cases and it's great that Python is getting more user-friendly with the data science workflow.

  • snydly 10 years ago

    I'd like to know this too... Comparison between PyCharm, Canopy, Spyder, Yhat, etc.

    After using it for 10 minutes, it feels identical to RStudio. That's a good thing.

  • vittore 10 years ago

    yeah, wonder what features are not covered by free version of PyCharm ( except of UI obviously copied from R)

ihaveajob 10 years ago

Neat tool, but watching the video, the grammar nazi in me couldn't stop looking at that "palendrome".

_RPM 10 years ago

Just curious, what qualifies it as Native?

  • bthornbury 10 years ago

    I am curious about this as well.

    Taking a look at the source (https://github.com/yhat/rodeo) it appears to be in all python.

    I was under the (perhaps mistaken) impression that native referred to code which compiled to assembly.

  • revelation 10 years ago

    Let's see, we're running a browser, which runs a JavaScript VM, which runs the our node.js logic, which runs Python, which calls into native numpy. See, native!

    I have this visceral reaction when I can tell something is based on Electron or IWebBrowser, 2.0.

  • shotwell 10 years ago

    as in native desktop

drvortex 10 years ago

It doesn't seem to be able to work Python 3.5. It doesn't find the path and now the interface is stuck.

cgm616 10 years ago

I am desperately trying to get this to work with my pyenv-virtualenv anaconda installation, but I can't get it to work out.

I also tried setting the path the ~/.pyenv/shims/python, but that didn't work out either.

ilyaeck 10 years ago

A pros/cons comparison to Jupyter would be helpful.

  • yeukhon 10 years ago

    Jupyter or formerly known as IPython Notebook has a huge UX problem for me. The UI is made to be like notebook (no duh), but for larger codebase you want to have an editor-like UI. Jupyter maybe okay for demo.

mrlinx 10 years ago

Finally, something very useful for anyone into python+data that doesn't like working inside a browser.

balls187 10 years ago

What makes this specific for Data Scientists?

Also curious about the performance of data-frame viewer for large data sets.

joelschw 10 years ago

Why should I use this over Spyder?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection