Do Developers Have Agency? A Longitudinal Study Revealing a Separation in how Software Systems Evolve Using 65,987 Popular Open Source Projects on GitHub

33 min read Original article ↗

Abstract

There are several mutually exclusive views on Software Engineering: empirical studies suggest the existence of natural laws that govern development, companies promote tools and methods promising groundbreaking results, and developers describe it as a creative and innovative endeavour. In this study, we significantly expand our previous research on software evolution, performing a longitudinal analysis of 65, 987 popular open-source projects on GitHub to investigate project trajectories. We also reflect on Les Hatton’s observation that the emergence of some properties might be “divorced from human agency”. We examined projects written in 85 different languages and found a separation based on the volume of commits. Projects with \(\ge \) approx. 700 commits to their main branch (10, 612, or \(16.1\%\) of the total) exhibit trends consistent with highly automated workflows. These large projects have been resilient to external events over the last few decades, show a higher likelihood of accelerating production, and deterministic evolution patterns, regardless of when they were started or how long they were developed. In contrast, the vast majority of smaller projects tend to exhibit less deterministic evolution patterns and are significantly more likely to experience deceleration over time. These findings reveal a separation in how different software systems evolve that might have significant implications for how we understand, study, and teach programming. Specifically, in our interpretation, the observed numerical dominance of smaller projects, potentially following less regular workflows, suggests a need for data curation to prevent the risk of assimilating and disseminating practices based not on their evolutionary success or long-term sustainability, but rather on their statistical prevalence among the numerous, potentially less mature projects. Researchers must also note that focusing solely on large projects can overlook a much larger and different set of projects that could benefit from a targeted study.

Similar content being viewed by others

1 Introduction

The last few decades have seen exponential growth in computing power (processors, memory, storage, and displays), significant improvements in development tools (IDEs, CI/CD, and cloud computing), the widespread adoption of Agile and DevOps methodologies, and the emergence of impactful AI technologies. However, despite these advances and significant external events (economic, health, and legal), fundamental questions about the intrinsic nature of software engineering remain. These questions are amplified by mutually exclusive perspectives on how software is built.

On the one hand, long-standing empirical observations, such as the laws of software evolution, suggest that software systems follow stable patterns of change, hinting at inherent constraints and regularities that limit human influence. On the other hand, the industry is dominated by companies promoting tools as groundbreaking innovations that promise to redefine workflows and significantly improve productivity. Simultaneously, developers themselves emphasise the creative and innovative aspects of their work, underscoring their importance and agency.

This work offers the following four main contributions:

  • Validation and expansion of our previous work: we validate and expand upon our previous work on how the effort invested in software projects evolves by analysing a significantly larger and more contemporary dataset of 65, 987 GitHub projects comprising \(\sim 7.3\) terabytes of data.

  • Discovery of a separation in how software systems evolve: our findings reveal a separation in software evolution, distinguishing between two distinct project types:

    1. 1.

      Large projects, with potentially regulated workflows: these projects (\(\ge 700\) commits to the main branch) exhibit trends consistent with highly automated workflows, are largely unaffected by external events happening over decades, and are more likely to accelerate over time.

    2. 2.

      Smaller projects: These projects show less consistent patterns and are more prone to deceleration, suggesting that they follow less mature processes.

  • Critique of Les Hatton’s Observation on Lack of Agency: we relate our observations to those of Les Hatton et al. critically:

    • We provide empirical evidence that the evolution of a substantial number of large projects (10, 612, or \(16.1\%\) of our sample) was largely unaffected by external events over the last few decades, based on their deterministic evolution pattern. Which might be consistent with Les Hatton’s claim that some properties are “ divorced from human agency” [7]. This finding supports the notion that development could be governed by underlying natural laws, irrespective of when a project was initiated or how long it has been running.

    • At the same time, our observation of a vast and separate set of smaller projects (comprising \(\sim 83.9\%\) of the investigated popular open-source projects on GitHub) that do not conform to the evolution trends of large projects, might point towards a potential research blind spot (sampling bias), showing that the observation might only hold within some restrictions (e.g., above a certain size threshold).

  • Implications for practitioners and researchers: We offer our interpretations on the implications of this separation for practitioners and researchers. In our interpretation, the numerical dominance of smaller projects necessitates rigorous data curation to prevent the assimilation and communication of practices based on their statistical prevalence rather than their long-term sustainability or evolutionary success. Researchers must be aware of the sampling bias inherent in selecting only a few projects.

This paper is organised as follows. Section 2 presents related works from the literature. Section 3 describes our work. Section 4 presents our results and analyses. Section 5 offers our interpretations, hypotheses, and a discussion of these results, and Sect. 6 on their validity. Finally, Sect. 7 shows our conclusions and Sect. 8 offers ideas for further research.

2 Literature Review

In this section, we present earlier literature related to our work.

In relation to software evolution: since Lehman published the laws of software evolution [16], a great deal of empirical research [10,11,12,13,14,15, 17, 18, 22, 29] shows that the laws seem to be supported by solid evidence. Our previous observations on 875 projects [26] also showed that over the past 25 years “Most projects seemed to be largely unaffected by external factors” and detailed empirical observations on an industrial closed-source project [14] that “the introduction of continuous integration, the existence of tool support for quality improvements in itself, changing the development methodologies (from waterfall to agile), changing technical and line management structure and personnel caused no measurable change in the trends of the observed Code Smells”.

Additionally to software evolution: Colfer at al. [1] have shown that mirroring the technical architecture to the organisational structures is probably the superior architectural strategy. Taube-Schock et al. [27] showed that high coupling is unavoidable and might even be necessary for good design. Empirical studies have revealed that several metricsFootnote 1 correlate to the point of redundancy [19], that several architectural properties of software systems are scale-freeFootnote 2 [9, 20, 21, 24, 25, 28]. Before Les Hatton et al. showed [6] that “the conservation of Hartley–Shannon information (CoHSI)... directly predicts both known and unsuspected common properties of discrete systems” logarithmic distributions were already observed in many systems from the module lengths of IBM 360/370 and PL/S code [23] to all metrics measured on FreeBSD [8]. Leading to the claim [7] that their emergence might be “divorced from human agency”.

3 Methodology

This section presents our methodology for collecting the 65,987 projects we analysed.

To extend our previous work to as many projects as possible, our previous method needed to be adjusted. We decided to use GitHub’s search API, which allows us to gather at most 1000 of the most starred repositories per language.

  1. 1.

    Using the snowball technique, we collected 85 languagesFootnote 3 for which the search functionality returned results.

  2. 2.

    Using the search API, we identified the repositories with the highest ratings for each language. Altogether, we cloned 66,042 repositories, \(\sim 7.3\)TB of data.

  3. 3.

    Using the “diff.renamelimit” option set to 130,000, “-first-parent” flagFootnote 4 extracted the Git logs from each repository.

  4. 4.

    To clean up the data, we investigated every case in which the number of lines appeared to drop to 0 or below. If the Git commits nearby and external information sources indicated the restart of the project, we handled it as the start of a new project. If the negative line count seemed to be the result of invalid commit dates set for a few commits, we tried to correct it based on the last previous greater date or if that was difficult and only affected a few commits, delete them from our data setFootnote 5. In 24 cases, we decided to filter out the project because it either lacked text-based content or we were unable to resolve the issue manually. This left us with data for 65,987 projects.

  5. 5.

    As previously, we tracked the number of commits, lines and effortFootnote 6 values. For each day, we summed all individual data points to create a single daily aggregate.

The date of the last full update of the data set is 2025.04.19

4 Results

This section presents our results and analyses.

4.1 Reproduction of Our Previous Result

To understand if the extended dataset would reproduce our previous results [26], we fitted simple regression models of different degrees, using time as the only independent variable (in R “lm(commits_so_far \(\sim \) poly(date_as_numeric, degree, raw=TRUE), data)”), across two distinct cohorts: the highly active projects (those 3, 134 with \(>3000\) commits) and the entire dataset (all 65,987 projects)

Fig. 1
Fig. 1

Distributions of \(R^2\) values for accumulated commits when correlated with models of different degrees on the full dataset (left) and only on projects having \(> 3000\) commits (right)

Full size image

Table 1 Comparison of \(R^2\) value for fitting accumulated commits with different degree polynomials in the previous and the current study

Full size table

Table 2 Comparison of \(R^2\) value for fitting accumulated efforts with quadratic polynomials in the previous and the current study

Full size table

This comparison highlights the difference between the two project populations (Tables 1 and 2).

Fig. 2
Fig. 2

Distributions of \(R^2\) values for effort when correlated with models of different degrees on the full dataset (left) and only on projects having \(> 3000\) commits (right)

Full size image

This reproduces our earlier findings, confirming that even on a much larger dataset, projects with more than 3000 commits mostly follow clear and highly deterministic evolution patterns for both commit frequency and accumulated effort. Although higher degree polynomials offer closer fits (Figs. 1 and 2) and do not exhibit overfitting even at degree five (Table 3), even simple linear or quadratic models might be a practical and effective approach for industrial applications. These larger projects seem to change direction at most once, over potentially decades of development. For this reason, in the rest of this article, when not noted otherwise, \(R^2\) values refer to fit against quadratic models.

The contrast between the highly active cohort and the results for the entire dataset provides initial, strong empirical evidence for the core observation of this paper. This difference suggests a separation in the software ecosystem, where the dynamics governing smaller, less active projects are vastly different from the deterministic trends observed in the largest systems.

Table 3 Adjusted \(R^2\) ranges for fits with different degree polynomials. N = 65,987

Full size table

4.2 Analysis Based on Commit Count

The \(R^2\) values, when evaluated against the total number of commits a project accrues to its main branch, exhibit a sharp rise correlated with project size:

  • For accumulated commits (Fig. 3), the median \(R^2\) value surpasses 0.90 early (in the 0–99 commit range) and crosses 0.95 in the 400–499 range and reaches above 0.98 in the 1,600–1,699 range, suggesting a baseline level of trend adherence even in smaller projects.

  • For accumulated effort (Fig. 4), the median \(R^2\) value first surpasses 0.90 in the 300–399 commit range and crosses 0.95 in the 1,100–1,199 range and reaches above 0.98 in the 6,400–6,499 range.

Fig. 3
Fig. 3

Distributions of \(R^2\) values, of a degree two model, for accumulated commits grouped by the number of commits reached during their development. (left) The distributions for all projects. (right) Zoomed in on the projects with \(< 5000\) commits. The X-axis is not continuous and has gaps where no project had that commit count

Full size image

Fig. 4
Fig. 4

Distributions of \(R^2\) values, of a degree two model, for accumulated effort grouped by the number of commits reached during their development. (left) The distributions for all projects. (right) Zoomed in on the projects with \(< 5000\) commits. The X-axis is not continuous and has gaps where no project had that commit count

Full size image

Fig. 5
Fig. 5

Distributions of \(R^2\) values, of a degree two model, grouped by having accumulated \(< 700\) or \(\ge 700\) commits. (left) The distribution of accumulated commits and (right) accumulated effort

Full size image

We observed a break point at approx. 700 commits, where both mean and median \(R^2\) values, for fitting the accumulated effort against quadratic models, first exceed 0.90. Projects with \(\ge 700\) commits to their main branch consistently achieve a significantly higher degree of fit. To formalise this split, we partitioned the dataset into two cohorts: Small projects (\(<700\) commits, 55, 375 projects) and Large Projects (\(\ge 700\) commits, 10, 612 projects). The distributions of their \(R^2\) values (Fig. 5) demonstrate the separation in how they evolve:

Table 4 Comparison of \(R^2\) value for fitting accumulated commits and efforts with quadratic polynomials for both project cohorts

Full size table

The large Projects consistently exhibit a high degree of fit, with median \(R^2\) values for both metrics exceeding 0.96 (Table 4), confirming their deterministic evolution pattern. The Small Projects show noticeably lower means and medians, reflecting a set of development trajectories that are much broader and less regular. Large projects consistently exhibit a high degree of fit (Table 5).

A Welch two-sample t-test on these two cohorts gives us a p-value of \(2.2e^{-16}\) (for accumulated commits t = 109.24, df = 63023, \(p \le 2.2e^{-16}\), Cohen’s d of 0.73 (\(95\% CI [0.72,0.75]\)), sensitivity = 0.0296, for accumulated effort t = 100.47, df = 45,700, \(p \le 2.2e^{-16}\), Cohen’s d of 0.74 (\(95\% CI [0.72,0.75]\)), sensitivity = 0.0296),Footnote 7 providing statistical evidence that the set of small projects (containing 55,375 projects) differs significantly from the set of large projects (containing 10,612 projects), establishing the empirical basis for the separation in software evolution.

Table 5 \(R^2\) ranges for fits with different degree polynomials, on the projects accumulating \(\ge 700\) commits. N = 11,826

Full size table

4.3 Impact of the Starting Year and Development Duration

Fig. 6
Fig. 6

Distributions of \(R^2\) values, of a degree two model, by the year of their first commit for the \(< 700\) (left) and \(\ge 700\) (right) commit cohorts, for effort. The X-axis is not continuous, with gaps of years during which no projects were started

Full size image

Fig. 7
Fig. 7

Distributions of \(R^2\) values, of a degree two model, by the years of development for the \(< 700\) (left) and \(\ge 700\) (right) commits cohorts, for effort. The X-axis is not continuous, with gaps that no project had as a development duration

Full size image

To understand how these cohorts change over time, we analysed the relationship between the project trajectory, the year of the first commit, and the duration of development (measured as the time elapsed between the first and last commit). Our observations (Figs. 6 and 7) provide evidence that the two distinct cohorts are stable and separable over time, independently of when the development started or how long it was active.

The long-term separation of the two cohorts suggests that the underlying dynamics distinguishing them are robust against large-scale external influences. We note that the number and magnitude of outliers appeared to increase until approximately 2010, stabilising afterwards. This stabilisation coincides with the widespread adoption of Git and GitHub, suggesting a relation to easier access to the technology and platform.

Development duration shows a clear, positive correlation with fit quality: we observe a strong tendency for \(R^2\) values to increase with a longer duration of project development. This supports the notion that the deterministic evolution pattern of the large cohort is related to the consistency and sustained effort required for long-term survival.

There seems to be a break in this trend for projects older than 25–30 years. As this duration greatly predates not only GitHub but also Git itself, these repositories had to be converted from other version control systems, which might have resulted in corruption that was not noticed during their conversion, limiting the reliability of trend analysis for the oldest projects in the dataset.

4.4 The Speed of Software Development

This section analyses the coefficients of the degree two polynomial models fitted to project trajectories, using them as proxies for the project’s speed of development (the linear coefficient) and its rate of acceleration or deceleration (the quadratic coefficient). We compare the distributions of these coefficients between the Large (\(\ge 700\) commits) and Small (\(< 700\) commits) project cohorts (Tables 6 and 7).

Table 6 The distributions of the linear and quadratic coefficients of the degree two polynomials, fit against the accumulated commits and efforts, on projects with \(\ge 700\) overall commits

Full size table

Table 7 The distributions of the linear and quadratic coefficients of the degree two polynomials, fit against the accumulated commits and efforts, on projects with \(< 700\) overall commits

Full size table

The quadratic component for matching against accumulated commits, for both cohorts, has a median close to 0.0, with both the first and third quartiles also being close to or below 0.0, indicating that projects in both cohorts follow a stable, nearly linear or slightly decelerating commit frequency trend. Although the values for the linear coefficient seem to indicate that the projects in the \(\ge 700\) commit cohort grow at a faster rate, a Welch two-sample t-test shows that there is no statistically significant difference between the project cohorts, neither in the linear coefficients ( t = \(-1.0442\), df = 10611, \(p = 0.2964\), Cohen’s d of \(-0.01\) (\(95\% CI [-0.04, 0.01]\)), sensitivity = 0.0296) nor in the quadratic coefficients ( t = 0.9944, df = 10611, \(p = 0.3201\), Cohen’s d of 0.01 (\(95\% CI [-0.01, 0.04]\)), sensitivity = 0.0296) in this metric.

In contrast, models fitted against accumulated effort show statistically significant, but practically very small differences in both their linear ( t = 2.3838, df = 57901, \(p = 0.01714\), Cohen’s d of 0.01 (\(95\% CI [0.00, 0.03]\)), sensitivity = 0.0296) and quadratic coefficients ( t = \(-2.4283\), df = 55673, \(p = 0.01517\), Cohen’s d of \(-0.02\) (\(95\% CI [-0.03, 0.00]\)), sensitivity = 0.0296).

The distribution of the quadratic coefficient provides the most compelling evidence of the separation:

  • Acceleration potential: In the Large Projects (\(\ge 700\) commits), the zero point falls within the Inter Quartile Range (IQR), with \(34.3196\%\) (3, 642) of projects exhibiting a positive quadratic coefficient, indicating accelerating development.

  • Deceleration dominance: For the Small Projects cohort, the IQR of the quadratic coefficient contained only negative values, with only \(19.0663\%\) (10, 558) projects exhibiting a positive quadratic coefficient, highlighting that the overwhelming majority of these projects are likely to decelerate.

The data is heavily skewed (the mean falls outside the IQR), suggesting that the median should be used as a robust measure to interpret the typical project trajectory. This suggests that projects in the \(\ge 700\) commits cohort start faster and their rate of deceleration is slightly higher.

Fig. 8
Fig. 8

Distributions of the quadratic coefficient values for accumulated effort when correlated with a degree-2 model, on projects with \(< 700\) overall commits (left) on projects with \(\ge 700\) overall commits (right)

Full size image

To better understand the structural forces driving these trends, we analysed the distinct distributions of the quadratic coefficients, as illustrated in Fig. 8. To analyse these distributions, we used the ‘fitdistrplus’ [2], ‘actuar’ [3] and ‘poweRlaw’ [5] packages in R. For the analysis of the distributions of the upper quadratic coefficients, these data were transformed so that the median was shifted to the origin. For the analysis of the distributions of the lower quadratic coefficients, these data were also shifted so that the median became the origin, and all values were multiplied by -1 to ensure that the values are positive for distribution analysis.

We found that in both cohorts, the quadratic coefficients smaller than the median of all quadratic coefficients followed a different distribution compared to the larger ones.

For quadratic coefficients smaller than or equivalent to the median, in the \(\ge 700\) cohort, the log-logistic distribution was the best fit, determined using maximum likelihood estimation (using the ‘fitdistrplus’ and ‘actuar’ packages in R). The fit yielded a shape parameter (\(\beta \)) of 0.8479 and a scale parameter (\(\alpha \)) of 0.0819. The goodness-of-fit tests, with this shape, scale parameter and setting estimated to true, including the Anderson-Darling (\(p = 0.9424\)) and Cramer-von Mises (\(p = 0.9418\)) tests, showed a strong fit, as evidenced by the high p-values.

A log-logistic distribution was also fitted to the coefficients smaller than or equal to the median in the \(< 700\) cohort. The maximum likelihood estimates were a shape parameter (\(\beta \)) of 0.5923 and a scale parameter (\(\alpha \)) of 0.0833. The Anderson-Darling (\(p = 0.3057\)) and Cramer-von Mises (\(p = 0.1732\)) goodness-of-fit tests did not reject the null hypothesis of a log-logistic distribution. The lower p-values suggest a less optimal fit compared to the \(\ge 700\) cohort.

For the distribution of the quadratic coefficients above their median, we found the power-law distribution to be the best fit (using the ‘poweRlaw’ package in R).

In the \(\ge 700\) cohort, a bootstrap goodness-of-fit test with 5, 000 simulations yielded a p-value of 0.221 and a goodness-of-fit statistic of 0.033. This result indicates that the power-law model is a plausible fit for the data.

For the \(< 700\) cohort, initial tests produced a p-value of 1. This outcome, while statistically significant, is likely an artefact of the sensitivity of the bootstrap procedure to the estimated xmin and may not represent a reliable test of the model’s fit. To ensure robustness, we performed additional bootstrap simulations with a fixed xmin parameter. A test with an estimated xmin of 58.50 (relative to the median) yielded a p-value of 0.849, while a test with xmin fixed at 0.01 (relative to the median, including \(\ge 6300\) observations) yielded a p-value of 0.784. The consistently high p-values in different xmin settings provide strong evidence that quadratic coefficients larger than their median follow a power-law distribution.

This analysis confirms our hypothesis that the distributions of the quadratic coefficients differ on either side of the median, with a log-logistic distribution fitting the lower values and a power-law distribution fitting the higher values.

4.5 Characterisation of the Majority, Presenting Less Deterministic Evolution Patterns

Fig. 9
Fig. 9

Density graph for the number of development days with commits (left) and number of days between the first and last commits (right), for the project accumulating \(< 700\) commits

Full size image

To better understand the smaller projects, we examined their development times. Finding a median of 24 days and a mean of 52.49 days with commits throughout their entire life (Fig. 9).

It is tempting to interpret projects with fewer than 30 days of commits as potential coursework, short-term experiments, abandoned prototypes or freshly open-sourced projects. For the 30, 448 projects in this category, commit \(R^2\) values would have a median of 0.90 and a mean of 0.79, while effort \(R^2\) values have a median of 0.86 and a mean of 0.75.

For 24,927 projects with \(\ge 30\) days of commits (and still \(< 700\) total commits to their main branch), commit \(R^2\) values would have a median of 0.94 and a mean of 0.90, while effort \(R^2\) values have a median of 0.90 and a mean of 0.85.

In addition, 9292 projects lacked sufficient data for analysis (Table 3). These projects did not have enough days with commits changing any text to fit against a degree five polynomial. A random sampling of this subset confirmed the presence of projects with either minimal activity or commits solely dedicated to adding/replacing non-textual assets (e.g., images).

Understanding the factors driving variations in these smaller projects requires further extensive study, which is beyond the scope of the present work.

5 Discussion

The preceding analysis established a separation in how software projects evolve, separating a small minority of large projects following deterministic evolution patterns from the vast majority of smaller, deceleration-prone projects. Moving beyond the strictly empirical evidence presented in the Results section, this Discussion section offers our non-data-driven interpretations and hypotheses of these observed dynamics. We explore the broader theoretical implications, specifically by critically relating our findings to the claim made by Les Hatton et al. [7]. We discuss how the characteristics of large projects may appear to support such claims, while the distinct evolutionary paths of the numerically dominant small projects point to a potential systemic bias that limits the generalisation of such claims.

We interpret our findings to suggest that the productivity of the large project cohort appears unaffected by any external improvements and events in the last few decades.

The goodness-of-fit to simple quadratic models for this cohort suggests that their evolution could governed by deterministic forces. Being able to find projects with such high goodness-of-fit in the majority of involved programming languages suggests that this deterministic evolutionary potential is not language-specific but rather a more general capability within software development.

When we try to interpret our observations in relation to the work of Les Hatton et al. [7], we observe that it could both strongly support and reveal the limitations of their observations:

  • In our subjective interpretation, the volume of large projects (where most of the projects studied by Les Hatton et al. fall [7]), that remained resilient towards external events, over decades, could be related to the effects observed by Les Hatton et al. [7]. Greatly exceeding the sample of software projects investigated in their research.

  • Conversely, the smaller projects that follow less deterministic evolution patterns, due to their overwhelming volume, can be interpreted to present an overwhelming volume of counterexamples. Or at least as showing that the observation might only hold within some restrictions (e.g., above a certain size threshold).

Interpreting our observation in relation to the Laws of Software Evolution, particularly the 4th Law, “Conservation of Organisational Stability” [16], reveals strong corroboration for large projects. The large project cohort’s deterministic evolution patterns and resilience provide independent, large-scale quantitative evidence supporting the Law’s assertion: “Unless feedback mechanisms are appropriately adjusted, average effective global activity rate in an evolving E-type system tends to remain constant over product lifetime”. The numerous small projects suggest that this effect may be limited to, for example, long-running projects or those reaching a certain size, indicating a boundary condition for the Law’s application.

In general, the numerical dominance of smaller projects (\(83.9\%\) of all projects in our dataset), can create a systemic risk for Practitioners: The pervasive practices within this larger cohort may be immature, not aimed at sustainable long term development or in case of freshly open-sourced projects not yet demonstrating them, potentially creating a positive feedback loop where ‘good enough, but not yet mature’ processes are inadvertently propagated. This risks drowning out true expertise and may lead to an increasing ratio of projects destined to decelerate or stagnate.

Researchers must be aware of the sampling bias inherent in selecting only a few projects:

  • Investigating only the largest projects risks being divorced from the reality of the majority of software projects, leading to non-generalizable conclusions.

  • Randomly selecting projects might lead to a selection dominated by the smaller projects, which may not represent optimal development processes.

It is important to interpret these statistical findings with nuance. Figure 10 illustrates such a nuance related to agency. FreeBSD, started in 1993, shows a clear trend with small deviations. Moya, started in 2014, also shows a recognizable trend, but with more observable signs of human agencyFootnote 8.

It is crucial to note that our interpretations and hypotheses regarding Les Hatton’s observation on properties being “divorced from human agency” [7] apply only to the cohort of larger projects, which also includes the projects investigated by Les Hatton et al.. The cohort of smaller projects requires further study.

Finally, we emphasise the generalizability constraints of our findings. We do not claim that there is a clear separation in all languages, or that such a separation always happens at approximately. 700 commits in each language. Whether a more precise limit can be found, whether this limit is constant or changes in time and whether there are differences between the languages requires further study.

Fig. 10
Fig. 10

The accumulating effort for FreeBSD (left) and the number of lines for Moya (right) as extracted from Git log

Full size image

6 Threats to Validity

This study may be subject to the usual threats to external validity, potentially limiting the generalizability of our results beyond our specific settings.

One particular threat arises from the possibility of incorrect author and commit dates for more commits. Due to our limited resources, our data cleanup efforts were constrained. To address this threat, we highlight the large number of projects investigated and how unlikely it is that a substantial number of contributors (\(15,700+\) for Linux) would intentionally and maliciously manipulate such data over several years.

The language property of repositories only shows the most prevalent language in that repository, not all the languages used. Among the large projects, some languages are represented by only a few projects (e.g., Frege and Ragel, with one each), which makes their membership fragile. Our observation that deterministic evolution patterns are possible across a wide range of languages could become invalid based on categorization changes.

Projects started before GitHub and Git were likely migrated to Git with automated solutions. If these migration scripts had flaws, they might introduce systematic errors into our investigation. The impact of this threat is limited, as only 145 had a start date before 2000.

Our analysis, which uses a granularity of days, would not capture large changes that were reversed within the same day.

Reproductions might see slightly different results due to projects being created/deleted, gaining/losing popularity, and, in general, their continued development.

Finally, our results might not generalise to closed-source projects, as we only had access to open-source projects on GitHub for this investigation.

7 Conclusion

This paper presents a large-scale longitudinal study on the evolution of software development effort in 65, 987 popular open-source projects on GitHub, using \(\sim 7.3\)TB of data. Our initial objective was to significantly update and expand previous empirical observations regarding the underlying laws governing software evolution, extending the scope to contemporary projects and a substantially larger set of projects.

During our investigation, we observed a separation among the analysed projects. Specifically, a distinct subset of 10, 612 projects (\(16.1\%\)) consistently exhibited deterministic evolution trends indicative of highly automated workflows. These large projects maintain a trajectory independent of their start date, duration, or external events over several decades.

In contrast, the remaining projects follow a different, less deterministic evolution, consistent with less maturity or not aiming to target sustainable long-term growth, and are characterised by a higher likelihood of production deceleration.

In our opinion, by demonstrating this separation, our work provides context for future research: studies focusing exclusively on the largest projects risk generalising the findings from a non-representative subset. Furthermore, the sheer volume of the smaller, potentially less mature codebases suggests a need for data curation to prevent assimilating and communicating practices based on their statistical prevalence rather than their long-term sustainability or evolutionary success.

8 Further Work

Building upon the discovery of a separation in how deterministically software systems evolve, our future work will focus on investigating the causes and consequences of this split in project dynamics.

Further research could investigate specific events within projects and their impact (e.g. [14]), analyse data from closed-source industrial projects, and explore the relationship between our findings and the importance of developer expertise [4].

Further research could also investigate the impact of other controlling factors like team size, funding, domain of operation and language effects.

Further research could extend the languages used for project selection to increase the number of analysed projects and also investigate the impact of different project selection criteria, beyond popularity in these languages.

Further research could refine the threshold (approx. 700 commits over the lifetime of the project), investigate the factors behind it and whether it changes over time. Do projects aiming to grow larger already start differently? Is this a boundary where the information-theoretical forces start to take precedence? Is it a limit that smaller (or less well-developed) projects are not likely to cross?

Further research could also investigate if this threshold is the same in every language, or if there are differences when analysed separately, and by language groups.

Further research could also do a more detailed investigation on the smaller projects to see if there are other such boundaries.

Data Availability

For our research, we used only publicly available data on GitHub (the Git repositories of the projects). In all cases, we used the latest version on the date mentioned in this article. We refer to specific phenomena in the article by providing direct links to the code version, using their unique commit hash identifier. All of the Code written for data processing and their output Data can be accessed at: https://github.com/KristofSzabados/reproduction_package_1. This includes the list of repositories collected, the output of our calculations, and the R script used for data analysis and generating the figures in this article.

Notes

  1. Cyclomatic Complexity, the Number of Lines of Code, Statements, Classes, Files, public APIs, and public undocumented APIs are redundant metrics, with Cyclomatic Complexity in classes and functions measuring the same subject.

  2. A network is called scale-free, when its degree distribution, follows a power law.

  3. Ada, Assembly, BASIC, Batchfile, Beef, C, C#, C++, Chapel, Clojure, CMake, CoffeeScript, ColdFusion, Crystal, Cue, CSS, D, Dart, DM, Elixir, Elm, Emacs Lisp, Erlang, Fsharp, Frege, Gleam, Go, Golo, Gosu, Groovy, Haskell, Haxe, HCL, HTML, Idris, Imba, Io, Java, JavaScript, Julia, Kotlin, Less, LiveScript, Lua, M4, Makefile, Mathematica, MATLAB, Nim, Nimrod, Nu, Objective-C, OCaml, Pascal, Perl, PHP, Pony, PowerShell, Prolog, PureScript, Python, R, Racket, Ragel, Raku, Red, Ring, Roff, Ruby, Rust, Scala, Scheme, Shell, SmallTalk, Starlark, Swift, Terra, TeX, TypeScript, V, Vim script, VimL, WebAssembly, Yacc, Zig.

  4. The “-first-parent” flag ensures that the log traversal follows a strict, linear history by only tracking the first parent of any merge commit. This methodology effectively isolates the main branch’s core trajectory, allowing us to analyze the sequence of commits merged into the project’s primary line of development, thereby excluding the detailed, concurrent history of temporary branches.

  5. additions and deletions added together with a positive sign.

  6. Using ‘t.test’ from the ‘stats’ package, ‘cohens_d’ from the ‘effectsize’ package and ‘pwr.t2n.test’ from the ‘pwr’ package.

References

  1. Colfer, L.J., Baldwin, C.Y.: The mirroring hypothesis: theory, evidence and exceptions. IRPN: Innov. Org. Behav. (Top.) 25(5), 709–738 (2016)

    Google Scholar 

  2. Delignette-Muller, M.L., Dutang, C.: fitdistrplus: an R package for fitting distributions. J. Stat. Softw. 64(4), 1–34 (2015). https://doi.org/10.18637/jss.v064.i04

    Article  Google Scholar 

  3. Dutang, C., Goulet, V., Pigeon, M.: Actuar: an R package for actuarial science. J. Stat. Softw. 25(7), 1–37 (2008). https://doi.org/10.18637/jss.v025.i07

    Article  Google Scholar 

  4. Fekete, A., Cserép, M., Porkoláb, Z.: Measuring developers’ expertise based on version control data. In: 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), pp 1607–1612, (2021). https://doi.org/10.23919/MIPRO52101.2021.9597103

  5. Gillespie, C.S.: Fitting heavy tailed distributions: the poweRlaw package. J. Stat. Softw. 64(2), 1–16 (2015). https://doi.org/10.18637/jss.v064.i02

    Article  Google Scholar 

  6. Hatton, L., Warr, G.: Strong evidence of an information-theoretical conservation principle linking all discrete systems. R. Soc. Open Sci. 6, 191101 (2019). https://doi.org/10.1098/rsos.191101

    Article  Google Scholar 

  7. Hatton, L., Warr, G.: The origin of shared emergent properties in discrete systems. Entropy (2025). https://doi.org/10.3390/e27060561

    Article  Google Scholar 

  8. Herraiz, I., Gonzalez-Barahona, J.M., Robles, G.: Towards a theoretical model for software growth. In: Fourth International Workshop on Mining Software Repositories (MSR’07:ICSE Workshops 2007), 21, (2007). https://doi.org/10.1109/MSR.2007.31

  9. Hyland-Wood, D., Carrington, D., Kaplan, S.: Scale-free nature of java software package, class and method collaboration graphs. In: Proceedings of the 5th international symposium on empirical software engineering (2006)

  10. Israeli, A., Feitelson, D.: The Linux kernel as a case study in software evolution. J. Syst. Softw. 83, 485–501 (2010). https://doi.org/10.1016/j.jss.2009.09.042

    Article  Google Scholar 

  11. Izurieta, C., Bieman J.: The evolution of freebsd and linux. In: Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engineering. ACM, New York, NY, USA, ISESE ’06, p 204–211, (2006). https://doi.org/10.1145/1159733.1159765

  12. Johari, K., Kaur, A.: Effect of software evolution on software metrics: an open source case study. SIGSOFT Softw. Eng. Notes 36(5), 1–8 (2011). https://doi.org/10.1145/2020976.2020987

    Article  Google Scholar 

  13. Kemerer, C., Slaughter, S.: An empirical approach to studying software evolution. IEEE Trans. Softw. Eng. 25(4), 493–509 (1999). https://doi.org/10.1109/32.799945

    Article  Google Scholar 

  14. Kovacs, A., Szabados, K.: Internal quality evolution of a large test system-an industrial study. Acta Univ Sapientiae 8(2), 216–240 (2016). https://doi.org/10.1515/ausi-2016-0010

    Article  Google Scholar 

  15. Lawrence, M.J.: An examination of evolution dynamics. In: Proceedings of the 6th International Conference on Software Engineering. IEEE CS Press, Washington, DC, USA, ICSE ’82, p 188–196, (1982). https://doi.org/10.5555/800254.807761

  16. Lehman, M., Fernandez-Ramil, J.: Rules and tools for software evolution planning and management. ASE 11, 15–44 (2001). https://doi.org/10.1023/A:1012535017876

    Article  Google Scholar 

  17. Lehman, M., Perry, D., Ramil, J.: On evidence supporting the feast hypothesis and the laws of software evolution. In: Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262), pp 84–88, (1998). https://doi.org/10.1109/METRIC.1998.731229

  18. Lehman, M.M., Ramil, J.F.: Evolution in software and related areas. In: Proceedings of the 4th International Workshop on Principles of Software Evolution. ACM, New York, NY, USA, IWPSE ’01, p 1–16, (2001). https://doi.org/10.1145/602461.602463

  19. Mamun, M.A., Berger, C., Hansson, J.: Effects of measurements on correlations of software code metrics. Empir. Softw. Eng. 24, (2019). https://doi.org/10.1007/s10664-019-09714-9

  20. Myers, C.R.: Software systems as complex networks: structure, function, and evolvability of software collaboration graphs. Phys. Rev. E 68(4), 04116 (2003). https://doi.org/10.1103/physreve.68.046116

    Article  Google Scholar 

  21. Potanin, A., Noble, J., Frean, M., et al.: Scale-free geometry in OO programs. Commun. ACM 48(5), 99–103 (2005). https://doi.org/10.1145/1060710.1060716

    Article  Google Scholar 

  22. Potvin, R., Levenberg, J.: Why google stores billions of lines of code in a single repository. Commun. ACM 59(7), 78–87 (2016). https://doi.org/10.1145/2854146

    Article  Google Scholar 

  23. Smith, C.P.: A software science analysis of programming size. In: Proceedings of the ACM 1980 Annual Conference. ACM, New York, NY, USA, ACM ’80, p 179–185, (1980). https://doi.org/10.1145/800176.809965

  24. Szabados, K.: Structural analysis of large ttcn-3 projects. In: Proceedings of the 21st IFIP WG 6.1 International Conference on Testing of Software and Communication Systems and 9th International FATES Workshop. Springer-Verlag, Berlin, Heidelberg, TESTCOM ’09/FATES ’09, p 241–246, (2009). https://doi.org/10.1007/978-3-642-05031-2_19

  25. Szabados, K.: Quality aspects of ttcn-3 based test systems. PhD thesis, Eötvös Loránd University, (2017). https://doi.org/10.15476/ELTE.2017.159

  26. Szabados, K.: A large-scale analysis of production effort changes in software projects. Acta Univ Sapientiae 16(2), 236–254 (2024). https://doi.org/10.47745/ausi-2024-0013

    Article  Google Scholar 

  27. Taube-Schock, C., Walker, R.J., Witten, I.H.: Can we avoid high coupling? In: Proceedings of the 25th European Conference on Object-Oriented Programming. Springer-Verlag, Berlin, Heidelberg, ECOOP’11, p 204–228, (2011). https://doi.org/10.1007/978-3-642-22655-7_10

  28. Šubelj, L., Bajec, M.: Software systems through complex networks science: Review, analysis and applications. In: Proceedings of the First International Workshop on Software Mining. ACM, New York, NY, USA, SoftwareMining ’12, p 9–16, (2012). https://doi.org/10.1145/2384416.2384418

  29. Zsiga, A.: Termelékenységi trendek, minták elemzése szoftverfejlesztési projektekben. Master’s thesis, Eötvös Loránd University, (2019) https://attila967.web.elte.hu/materials/DiplomamunkaZsigaAttila2019.pdf

Download references

Acknowledgements

The authors thank Izabella Ingrid Farkas for her feedback on this article.

Author information

Authors and Affiliations

  1. Informatics, Eötvös Loránd University, Pázmány Péter sétány, Budapest, 1117, Hungary

    Szabados Kristóf

Contributions

K.Sz. conducted the literature review, downloaded the open-source git repositories, analysed them, wrote all scripts, and wrote the main manuscript text. K.Sz. also designed and formatted the tables and figures included in the study. The author reviewed and approved the final manuscript.

Corresponding author

Correspondence to Szabados Kristóf.

Ethics declarations

Conflict of Interest

The authors declare no conflict of interest.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kristóf, S. Do Developers Have Agency? A Longitudinal Study Revealing a Separation in how Software Systems Evolve Using 65,987 Popular Open Source Projects on GitHub. Acta Univ. Sapientiae Inform. 18, 6 (2026). https://doi.org/10.1007/s44427-025-00019-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1007/s44427-025-00019-y

Keywords