Python gets a 'Developer-in-Residence'

10 min read Original article ↗
We're bad at marketing

We can admit it, marketing is not our strong suit. Our strength is writing the kind of articles that developers, administrators, and free-software supporters depend on to know what is going on in the Linux world. Please subscribe today to help us keep doing that, and so we don’t have to get good at marketing.

Backlogs in bug triage, code review, and other elements of the development process are nothing new for free-software projects; there is clearly a lot more interest in creating new features (and the bugs that go with them, of course) than in taking on the less-satisfying bits. For a large project like CPython, though, the backlog can seriously impede progress—potentially chasing off contributors whose work falls through the cracks. In order to address that, the Python Software Foundation (PSF) has raised some funds to hire Łukasz Langa as the CPython "Developer-in-Residence". Langa will be working to help clear the backlog, while also looking into other areas of interest to the PSF and the Python steering council.

Langa is a longtime CPython core developer and the release manager for Python 3.8 and 3.9; he is also the creator of the Black code formatter for Python. But, beyond all of that, he has been advocating for more full-time Python developers for a while now, so this is something of a dream come true for him personally. He described the goals for the position in a July 12 blog post, which was his first day on the job:

When the PSF first announced the Developer in Residence position, I was immediately incredibly hopeful for Python. I think it's a role with transformational potential for the project. In short, I believe the mission of the Developer in Residence (DIR) is to accelerate the developer experience of everybody else. This includes not only the core development team, but most importantly the drive-by contributors submitting pull requests and creating issues on the tracker.

He noted that Python is largely volunteer-driven, though there are a few folks being paid by their employer to work on the language full-time; for example, there is a team of three people (including Python creator Guido van Rossum) who are working on speeding up CPython. Full-time people make a difference: "Just by the sheer force of sitting at a desk for a given number of hours they can achieve big things." But the role of the DIR is somewhat different; he sees it as a way to bring in even more contributors to the project:

Now, what can the DIR do as one person? I believe I can multiply the impact of the hundreds of contributors who are not core developers. The DIR can do it by:
  • providing a steady review stream which helps dealing with PR backlog;
  • triaging issues on the tracker dealing with issue backlog;
  • being present in official communication channels to unblock people with questions;
  • keeping CI and the test suite in usable state which further helps contributors focus on their changes at hand;
  • keeping tabs on where the most work is needed and what parts of the project are most important.

There is an important side effect to providing this service, and that is a good first developer experience for an external contributor. Great contributor experiences lead to future contributions, and a stream of contributions leads to an occasional contributor becoming a core developer. I've seen this through leading the Black project. It's now maintained by 9 people including me.

The DIR position is patterned after a successful program for the Python-based Django web framework; since 2014, the Django fellowship program has been paying people to handle "some of the administrative and community management tasks" for the project. One aspect of that program that Langa has adopted is its weekly progress reports; transparency is one of the key elements of the job and in measuring its impact. He noted that metrics like the number of pull requests and bug reports handled are useful, but there is more that he (and the community) would like to know:

It would be awesome to have some insight into whether having a DIR improves the health of the Python community, has any concrete impact over Python adoption, or Python runtime's performance, and so on, and so on. How to measure this is admittedly unclear to me at this point.

He will be undertaking some larger-scale research to try to help determine what kinds of work the project's volunteers are doing and how the CPython project is functioning overall. For example, the Python standard library is often under discussion within the project, but there are unresolved questions about it that he will be seeking answers on:

Research to understand the project better can be done by capturing user interest in particular libraries within the standard library, the amount of work given libraries require, and who the active experts behind the libraries are. This can be done through open issue analysis, the amount of pull requests for a given library, but also through surveying core developers and users as some libraries might require little maintenance but be critical. The reason to do this research is to determine which standard library modules need help and what the maintainer cost is for standard library modules.

So far, Langa has posted two admirably detailed status reports for the first two weeks. Each includes a daily log of his activities, which show an eye-opening amount of progress with closing pull requests and bug reports. In addition, there is quite a bit of other information about the things that he looked into. For example, he observed a test failure on Windows in week one:

Since Azure Pipelines just released an update to their Windows image and I identified that the problem only happens with the new image, I was ready to report a regression on their GitHub project. But I noticed that somehow the problem only happens to me.

I joked that this is probably because of my first name… and it turned out to be true. Maybe I shouldn't be surprised. I have a long history of Unicode-related issues affecting me in the real world.

Long story short: the new Azure Pipelines image includes the full name of the PR author in an environment variable. The HTTP_ACCEPT tests tried to serialize all of os.environ to an ASCII string in the CGI script. This crashed the CGI script but since the tested server is fork-based, this crash was only seen on the test side as truncated output.

Knowing this, it was easy to fix. But this [easily] took 10 hours of work for me this week.

That same kind of problem manifested itself (more benignly) during his 2018 Python Language Summit talk: "[...] he noticed his name in Larry Hastings's schedule-display application started with the dreaded ▯ rather than Ł. That, he said with a grin, is the story of his life."

Meanwhile, in week two, he tackled some problems with tests that intermittently fail in the continuous integration (CI) system. While there are efforts to reduce the number of such tests, or to reduce their volatility, tests for areas like networking or that use timeouts will likely always be somewhat problematic; he has improved things substantially:

Up until now, re-runs triggered re-running all tests in the affected test file. As you can imagine, the test files for the most flaky tests tend to also be the largest. Now with a change I made (see BPO-44708 for details), we are only re-running affected test methods which speeds up the occasional re-run dramatically.

[...] As you can see, re-running concurrent_futures tests now takes 2.5s instead of over 82s.

Interestingly, this not only increases speed of the re-runs but also makes it more likely that they will succeed. You see, the fewer tests we need to re-run, the less likely we are to hit another intermittent failure.

One guesses that these reports may become more terse over time, but they definitely reflect a high level of excitement with his new role. Langa is also cognizant of his responsibility both to CPython and to the DIR program. He strongly believes in the need for this position—it could perhaps lead to other paid positions within the Python ecosystem for one thing—so he intends to leave a lasting impression of its value:

On the other hand, this year will prove whether this position is worth keeping for future years. I deeply believe it is, therefore it all comes down to my performance over the next twelve months.

A recent thread on the python-dev mailing list highlights the impact the DIR position could have. In the thread, a Python user pointed to a bug that was filed in 2017, had a pull request associated with it, and still remained unfixed, They wondered if that was evidence that the Python review process is flawed. As Van Rossum replied, Python is no exception to the rule that "*maintainer attention* is actually the scarcest resource in many open source projects". In addition, while it may look like a bug fix "just" requires review, often that is actually the most difficult part:

I think there's a misunderstanding here -- you seem to imply that producing a bugfix is work that takes somebody's time, while reviewing a bugfix is not work and doesn't cost anything. But realistically, for most issues, things are the other way around -- writing the code is easy (at least to a core dev :-) but reviewing code is a gut-wrenching process that takes up emotional energy and a lot of time. Given the discussion in the issue corresponding to the PR it is clear that that is what is going on here.

As Barry Warsaw pointed out, unblocking this kind of problem "is exactly what the Developer-in-Residence program was designed to address". That specific problem was resolved by Van Rossum before the DIR came on board, but one might guess that these kinds of problems will be reduced over the next year.

There are other examples of people being hired by foundations to work on open-source projects, and some companies are supporting those contributors, both full- and part-time, as well. The Linux Foundation fellows are a prominent example; they often take on duties that might not be done by volunteers or those solely focused on their company's development priorities.

Hiring a Developer-in-Residence is a big step for the PSF, but one that seems likely to pay outsized dividends. Bringing in more contributors, which will hopefully lead to more core developers, is important for any open-source project—Python is no exception there either. Providing a smoother path for those contributors in the early going, and trying to ensure that their efforts do not languish, will help a lot in that effort. Finding a way to hire for this kind of role is a step that more projects should consider—many probably will.


Index entries for this article
PythonCPython
PythonPython Software Foundation (PSF)