The Making of SciPy 1.0
numfocus.orgI love SciPy and the surrounding pythonic scientific computing tools! Especially Jupyter notebooks! I've been playing around with SymPy a lot lately and love that it pretty prints latex in notebooks. Here's a nb I did recently on the taylor series demoing quite a few features:
https://github.com/DanburyAI/SG_DLB_2017/blob/master/Noteboo...
This a great resource for engineers to wean themselves off Matlab. Maintaining Mathworks licenses is expensive and they can be assholes about license usage. We recently purchased some additional toolboxes and all they could do was bitch about how we had deployed our existing licenses. If you don’t like it why do you allow it?
In addition, recent versions of Matlab have broken some common plotting tools, contourf in particular. Where this once produced elegant and light weight results it now generates a tessellated mess with tiny gaps between elements that look awful in paper manuscripts and baloon the file size for no good reason. The Mathworks response? It fixes some edge case.[0]
I find many of students we take on as interns often lack more than rudimentary coding skills making the jump to Python easier.
[0] https://uk.mathworks.com/matlabcentral/answers/162257-proble...
In my work area, I was the first Python convert (from Visual Basic and a chain of other languages going back to 1981), and we hire scientists and engineers with no particular regard for their coding ability. We now have more than half a dozen people using Python at varying levels, with additional people mentioning that they're investigating it through online courses, home projects, etc.
A few of them have transitioned from Matlab. Even though we're a commercial business, buying each engineer a Matlab seat with packages is not a trivial expense, and asking for the money requires more management attention than we really need. With Matlab, it's: "You need Matlab for what? Why are you working on that?"
With Python, it's: "You solved that problem? Great."
But there are of course some caveats. Learning Python per se won't turn you into a professional software developer. A lot of people list Python on their resume's, and our interviewers are engineers and managers who only know that Python is a good buzzword to have. Learning Python ultimately translates into a widely varying ability to actually do anything. One of my colleagues has really taken to it, and has made an effort to develop disciplined programming skills beyond "coding." A couple others are using it for basic scripting, data analysis, and so forth. I've seen some horrendous code, and I'm torn about whether to intervene, when they are in fact getting useful results. Usually, I tell them about some things to learn on their own before they start their next project.
But that problem -- what really constitutes good programming skill for people who aren't formally trained, being hired by people who don't really know either -- isn't confined to Python.
I don't see a reason for MATLAB anymore now that there's Julia. It takes the good parts of the linear algebra syntax and throws out the rest.
The only reason, if you can call it that, is academic inertia. It’s is taught as part of most engineering degrees and is accessible enough for students to stick with it.
One can leave Matlab today with Octave.
Library support is perhaps better with SciPy, but Octave is free as in freedom, fast, works.
Octave works well but functions differ subtly from MATLAB and in case of graphics, there are large gaps. There’s also a significant performance gap.
Summary: SciPy added Windows binaries for pip and formalised their organizational structure.
SciPy 1.0 — 16 Years in the Making https://www.numfocus.org/blog/scipy-1-0-16-years-in-the-maki...
Working with these libraries and the ecosystem has been great. We tried Julia for a while but found it sorely lacking whereas python has such a great community and history and wealth of experience and wisdom to share.
I have had the opposite experience since most decent Python packages seem to have their own implementation of a parser, JIT, etc. in order to have any speed, making them monolithic monsters that are hard to contribute to and are hard to modify. Python's ecosystem seems to handle the basic cases very well, but when I wanted to "go to research land" in pretty much any scientific computing subject (mathematical optimization, numerical linear algebra, or differential equations), I quickly hit a wall that would require I write one of these monoliths myself in C++. I haven't hit any walls in Julia, but then again if you stay in the basic standard equations/models YMMV.
But SciPy is a great project and everyone can learn something from their successes.
I don't really get your point about monoliths in this specific usecase. Let's say you're exploring with NumPy. What stops you from
(a) writing your research as a bunch of Fortran or C kernels and intrgrate them with automatic bindings such as f2py? Especially Fortran is a great fit for NumPy datastructures - because they are the same.
(b) Use high performance python environments like Numba, NumbaPro (GPU) or even Cython?
>(a) writing your research as a bunch of Fortran or C kernels and intrgrate them with automatic bindings such as f2py? Especially Fortran is a great fit for NumPy datastructures - because they are the same.
If I have to write everything important and difficult in Fortran or C kernels, then I'm losing the productivity of the higher level language. And in this case it would be writing kernels from scratch instead of tweaking high level package code if I want to change data structures and implement new algorithms for existing problems. This makes it hard to directly re-purpose the internal linesearch code from the SciPy optimizers without Python in-between into my own Fortran optimizer, or add new projections and stuff to existing SciPy implementations and have that be a separately maintained package but dependent on SciPy.
Even just talking about this starts to sound like a mess from the past, but fortunately this problem is already solved by Julia so there's no reason to have to deal with it.
>(b) Use high performance python environments like Numba, NumbaPro (GPU) or even Cython?
Well the biggest thing against Numba is once again you have to write everything from scratch if you want to do things non-trivial. They finally have a way for you to extend Numba to add a class, but you don't by default get the whole Python standard library compatible with it, and classes defined in user's packages aren't automatically compatible with it. So you have to start re-writing basic data structures and other types to get the full algorithm JIT'd. And Python's standard library isn't all setup with @generated_jit to then dispatch everything effectively. So you end up having to build tons of things on your own because of the lack of library support, and it adds development time. Then there are restrictions on allocations, no easy way to do distributed parallelism, take in functions to my functions to inline them (this is important for writing things like ODE solvers, but with Numba you have to do a hack to make this almost work but it won't optimize this well), etc. Essentially it hits a feature wall when it gets past microbenchmarks and to real programming. This all hearkens back to your first comment:
>Let's say you're exploring with NumPy
Exactly: if you're content with building everything off of arrays yourself then Numba will do fine. But its limitations significantly impact the ability to compose full packages in native Python with constant propagation, interprocedural optimization, etc.: all the stuff you you depend on static languages doing in order to optimize large-scale software. On top of that, it's a difficult dependency to install. So in the end, it's great for JITing little blurbs, but I quickly hit the edge when trying to fight it.
Cython is similar to Numba, where you can setup your own fused types and extension classes, re-writing the standard library, etc., but now you've taken to writing essentially the stdlib for a language (or at least the parts you need) just to be allowed to use this "easy accelerator" for a demanding project.
If it was 10 years ago these issues would be okay, but these days it stands out as a productivity and performance barrier. Julia has already solved this problem, and hell even LuaJIT is easier to build this kind of software with. Or if you're going to fuss around with compiled code, we can use D or Rust, or even go all the way back to C++ and Fortran. In fact, it's quite telling that even fairly recent big "Python" projects like TensorFlow, PyTorch, FEniCS, etc. all built their whole package in C++ and made Python bindings instead of trying to use these accelerators. Meanwhile people are punching out large Julia projects are using pure Julia without hesitation.
To an end user of packages it really doesn't matter how it was implemented, but when your work and research is in developing these kinds of large software projects, these limitations make a huge difference.
Thank you for the expansive answer - I can really see where you're from now and I agree with your issues in Python. It's interesting to me that you go to Julia as a solution (I mean that honestly, I want to play with it when an exploration phase comes up).
What was Julia lacking? I know and use R, but have been considering Julia.
I'm pretty sure GP was complaining about the lack of libraries and the breaking changes between verions. Julia is much younger, has a smaller community and still in active development as a language, so it is very likely that you have to build and maintain your own code.
With SciPy, the community is much bigger and present in every imaginable scientific field. Whenever someone comes up with a new algorithm, there's a python implementation before you know it. I'm not going to pretend that this isn't a huge added value, but I find the way GP phrased themselves a little bit annoying, as it implies the language Julia is is the issue. It is honestly a more beautiful and more efficient language to program in.
(also, these days you can call out to Python and R without significant overhead, so you can use any package out there; Julia has a pretty good interop story)
Great news. It's really good to have an alternative for the expensive Matlab.