A native Python IDE built for data science
yhat.comI find the dramatic rise of Python (and open source tools in general) for scientific work interesting and cool. When I first started using Python many years ago, I was doing contract work for the SciPy/NumPy folks (Enthought), and Python was still a blip in the scientific world...Java and Fortran and a bit of C++ ruled the commercial world, with Mathematica and MatLab handling the academic side of things (with some overlap and some outliers).
It's really cool to see. I like seeing science democratized, and Python is definitely a democratizing influence, and the fact that so much of it is open source is really fantastic. I've also noticed that a lot more domain experts are becoming programmer+domain experts through this evolution. It used to be that there were teams with a scientist to design it and one or more programmers to implement it, and that's becoming less of a requirement, which can accelerate the science-ing to a notable degree.
The UI is obviously inspired by Rstudio for R. And I have zero objections to that; this is something that I've wanted for awhile, after having difficulty with PyCharm for my Python-related data projects. I'll play around with it a bit.
As a heads up, the setup workflow assumes you are on OS X, which may be a problem if it asks you to open a Terminal on Windows: http://i.imgur.com/nya50e4.png
I realized the download on Linux is massive, though (600+ Mb) - why is that? R and RStudio combined weigh way less than that.
Plus, for distributing binaries in Linux, instead of a zip file (tar.gz would be more common, too) it's better to support the main distros with a repository (PPA for Ubuntu, pacman for Arch, etc...) since it's way more user friendly every single time you want them to stay up to date.
hey minimaxir, the commands should still work if you have python and/or conda installed. if you have any issues you can post here: https://github.com/yhat/rodeo/issues.
thanks for trying it out!
That works for the pip command. Since people who analyze data may not necessarily be experts at the command line, I recommend relooking at this workflow.
matplotlib, however, fails to install completely with this method on Windows for subtle reasons. Filed: https://github.com/yhat/rodeo/issues/204
The documentation just points to a blog article on how to install matplotlib on Windows.
Funny enough, I traded in Rstudio for Jupyter notebooks for R, especially for demos to other people since it is much easier to see tables, graphs and such.
In the last year, my workflow for data science/AI has completely shifted to Jupyter notebooks. Is there any IDE that offers a similar experience?
Jupyter dev here. FYI, we're currently working on building a new Jupyter web interface that resembles a more classic IDE experience, which we are calling JupyterLab. A first version is progressively coming together, and is planned to have code editor and terminal components. We also plan to have a notebook component, like the current notebook, in a later version. Our in-progress work is spread across many repos currently (see the various jupyter/jupyter-js-* repos on github).
Sounds promising, thank you for working on this!
Yup, same experience (also for data science). The biggest helper I've had so far is jupyter-vim-bindings[1]
There is Beaker notebooks which is similar to Jupyter. Haven't tried it but you can integrate multiple languages in one notebook.
I really like the idea behind beaker, but last time I played with it, the main issue/concern occurs for me when using a somewhat large (uses most of machine RAM) dataset, since using it in another language creates an additional copy of the data in memory for the other language to use. This multiplies the memory used by the number of languages that need an instance of the dataset. If there could be shared memory for datasets somehow, it would be much more useful (if they've figured that out since I last used it, please tell me).
You can do the same in ipython notebook (I'd presume jupyter as well) using magic commands: http://rpy.sourceforge.net/rpy2/doc-2.4/html/interactive.htm...
It's kind of weird to use but it works for the most part. You can clean up some data in python, then push the data over to a cell written in R to do some other evaluation, then push the results back over to python.
By "native" they mean Electron-based deployment of HTML/JavaScript. More info:
See also: Pycharm
Off-topic, but when did Jetbrains switch to the subscription model?
There was a big brouhaha about 4 months ago.
And also Pythons Tools for Visual Studio.
honest question: what if your science isn't maths/physics/data? I'm a chemist and from what i can see there's @#$@# all out there in FOSS land.
Excellent question. Here's my chemistry cred: I'm married to a chemist, related to a couple more, and have worked in an area related to analytical chemistry, though I got my degree in physics, 2+ decades ago.
So here are some generalizations.
While in school, I noticed that the physics students were far more interested than the chemistry students, in math and computer stuff. Maybe we were computer science wannabees, or maybe we guessed (correctly in my case) that proficiency with computers would make us more employable. This was true in both undergrad and grad school.
And there's a long tradition of physicists stealing ideas from math and computation for solving physics problems. When I was in school, computation was considered to be a specialized branch of chemistry, but was at the forefront of physics.
Another difference is that the physics students were generally more interested in making our own tools. The current "maker" and "hacker" trends are old hat for small-lab experimental physicists.
Chemistry has always been a bigger field than physics, which I suspect has attracted more interest in making commercial equipment and software. I've noticed in an industrial setting, that managers are often looking for closed solutions that can't be modified by the user, either for regulatory reasons or adversarial labor-management attitudes. The industry wants your boss to think that letting you make your own tools is either dangerous, or a waste of your time.
In contrast, even in industry, physicists still have to make our own tools. And management already knows that we're freaks. ;-)
So the absence of FOSS tools for chemistry doesn't shock me.
> I've noticed in an industrial setting, that managers are often looking for closed solutions that can't be modified by the user, either for regulatory reasons or adversarial labor-management attitudes. The industry wants your boss to think that letting you make your own tools is either dangerous, or a waste of your time.
It's interesting to consider in this context that workers owning the means of production is what links the GPL with Marxism.
It took me way too many years to discover FOSS, but after getting hooked on it, I find that I'm actually more motivated and creative when I'm using tools that nobody owns.
There are some tools out there, for example Open Babel:
http://openbabel.org/wiki/Main_Page
which has some python bindings built in. I set some of this up for myself during my PhD but it was occasionally kind of a pain sometimes to get it to work. Also at the time I was a bit of a noob so there's that :).
It has some nice features for handling chemical structures, I used it mostly for translating one format into another and computing fingerprints, but I think more can be done.
In general I'd agree with @analog31, biology has some good OSS tools, physics has some good OSS tools, but you get to the bridging discipline of chemistry and you find very few. My theory re. organic chemistry and biochemistry applications: it's way more profitable to be closed source. In contrast to the other two fields (gross generalization I know, but somewhat true) there's a very large market for commercial software in Pharma. If someone is willing to pay top dollar, especially an industry that is paranoid about IP and therefore tends to (rightly or wrongly) prefer closed, proprietary solutions, then that's where software will end up.
I doubt anyone doing data analysis is unfamiliar with R, as it is the current standard. The push to use Python, or Julia, or a version of Lisp (hey, we had that in the 1990s -- us olds remember xlispstat) for data analysis is coming from the people who find R to be a rather unpleasant language. Which is a subjective opinion, obviously, but not an uncommon one.
Learning Functional Programming (I learned Racket) makes R great since it really is a functional language.
I'm curious what advantages are there with this or (PyCharm) over something like Spyder?
PyCharm is unparalleled in its understanding of code and it's great for building codebases. It is a programmer's tool first and foremost. I find PyCharm's interactive features clunky and have to do extra work to see the data.
RStudio / Rodeo provides an interactive data analysis environment where multiple "views" are presented right in front of the user. A view could be a plot, a data frame or interactions between the code editor and the terminal. As a data analysis person it really helps to put the mental strain of code far away as possible and just explore the data.
Jupyter Notebook are nice but it can get overwhelming (too much scrolling) when things get complicated. Great teaching tool, however.
I think each of these tools have different use cases and it's great that Python is getting more user-friendly with the data science workflow.
I'd like to know this too... Comparison between PyCharm, Canopy, Spyder, Yhat, etc.
After using it for 10 minutes, it feels identical to RStudio. That's a good thing.
yeah, wonder what features are not covered by free version of PyCharm ( except of UI obviously copied from R)
Neat tool, but watching the video, the grammar nazi in me couldn't stop looking at that "palendrome".
Just curious, what qualifies it as Native?
I am curious about this as well.
Taking a look at the source (https://github.com/yhat/rodeo) it appears to be in all python.
I was under the (perhaps mistaken) impression that native referred to code which compiled to assembly.
When referring to tooling for a language, 'native' tends to mean it's written in that language.
I always thought native meant native to the platform.
Let's see, we're running a browser, which runs a JavaScript VM, which runs the our node.js logic, which runs Python, which calls into native numpy. See, native!
I have this visceral reaction when I can tell something is based on Electron or IWebBrowser, 2.0.
as in native desktop
It doesn't seem to be able to work Python 3.5. It doesn't find the path and now the interface is stuck.
I am desperately trying to get this to work with my pyenv-virtualenv anaconda installation, but I can't get it to work out.
I also tried setting the path the ~/.pyenv/shims/python, but that didn't work out either.
A pros/cons comparison to Jupyter would be helpful.
Jupyter or formerly known as IPython Notebook has a huge UX problem for me. The UI is made to be like notebook (no duh), but for larger codebase you want to have an editor-like UI. Jupyter maybe okay for demo.
Finally, something very useful for anyone into python+data that doesn't like working inside a browser.
What makes this specific for Data Scientists?
Also curious about the performance of data-frame viewer for large data sets.
Why should I use this over Spyder?