I was searching for one last real world example for my upcoming video talk March 13th on time series forecasting.
Hope to see you there! Or reach out to Win Vector LLC for custom training!
I had the seemingly harmless thought: “Let’s look at Stack Overflow trends“. In particular their pre-built “data science and big data trends” query seemed fun.

This is a graph of the percentage of Stack Overflow questions tagged with data science terms such as R, Pandas, and so on. It seems to show exploding interest in R and Pandas, and maybe even Tensorflow. Pandas was likely chosen as a proxy for interest in Python for data science (versus a general interest in Python). I’d prefer view counts over question percentages as a proxy of interest, but it is what it is. I think the intent of showing percents was an attempt factor out what part of the data is about Stack Overflow, and what part is about the topic. However, it may take more than a simple division to un-stir those concerns.
Then I thought, let’s see if they have newer data. They do, and it is horrifying (though not unexpected to those of us in the industry).

The graph appears to show interest in R and Pandas rapidly falling on Stack Overflow. I know there are alternatives to Pandas (such as Polars), but my spot checks didn’t show any of them taking Pandas’s place. ChatGPT represents both a replacement for Stack Overflow questions and a replacement for other topics to do projects in and ask questions about. Likely we are seeing a big replacement of data science courses with LLM course work and projects. A relevant point is that ChatGPT was released in November of 2022.
For laughs I digitized the results from the graph into numbers and used Sanjiv Ranjan Das’s excellent book chapter “Product Market Forecasting using the Bass Model” to fit a good old Bass product diffusion model onto the data. The joke is that the Bass model assumes all products die. That isn’t so much the prediction of the Bass model, but one of the assumptions of the model. The idea is: products go obsolete, and Bass helps estimate when.
The Bass methodology gave me the following graph. Keep in mind: it forces this paraboloid like shape no matter what the data.

As dire as the Bass curves are, they are not that far off yet. I did the analysis in R, so I am pleased it chose R (itself) to outlast the other systems :). All jokes aside: forecasting helps you plan and adapt.
Note
An issue with the Bass model is: just about the only time the Bass model doesn’t think one is right before a collapse is when the Bass model thinks the collapse has already started.
Animations of the issue:
- evolving the training region. An issue with the Bass model is: just about the only time the Bass model doesn’t think one is right before a collapse is when the Bass model thinks the collapse has already started.
- assuming different fractions of total sales seen. The observation is: changing the assumed fraction of total sales seen to date doesn’t change the fit curve in the training region much. Therefore we can not expect quality of fit in the training region to tell us what fraction of the total sales have been made (even though we want to know that!).
Thank you to Nina Zumel for help on the animation.
And the same for Python itself

Code here.
Tagged as: Bass model pydata R