Disabled at 22 million commits

programming.dev

120 points by volongoto 3 years ago · 144 comments

Reader

MBCook 3 years ago

So the author was purposefully trying to do the most extreme thing they could to see how git/GitHub act/break.

I don’t blame GH at all.

Source: https://web.archive.org/web/20230702215522/https://sh.itjust...

aleph_minus_one 3 years ago

> So the author was purposefully trying to do the most extreme thing they could to see how git/GitHub act/break.
This is Hacker News. Hacking is about using, in particular, technology in surprising ways that were not intended by the creators.
- jstummbillig 3 years ago
  
  No harm, no foul, because, what, it says "Hacker" in the title? I don't know, are we 12?
  The reason that hacking is even a thing: It's actually possible to break things in a responsible, non-destructive way (in contrast to most things in the physical world).
  If we skip the responsible part, we are just... breaking things and incurring costs. Why should that be okay?
- csiegert 3 years ago
  
  It’s GitHub, not HackerHub. That the story is reported on Hacker News is irrelevant.
  - aleph_minus_one 3 years ago
    
    But I would expect a different sentiment of comments on a site called Hacker News.
    
    ThrowawayR2 3 years ago
    
    That it's being scoffed at on Hacker News ought to tell you how uncreative and trifling it is. It's a script kiddie stunt not remotely worthy of being considered actual hacking.
- ASalazarMX 3 years ago
  
  This is Hacker News after all, not Hack-The-World News.
iLoveOncall 3 years ago

> I don’t blame GH at all.
I don't really see anyone blaming GitHub, not even the original post, I'm not sure why all the responses here are insinuating that?
rafark 3 years ago

This is why we can’t have nice things.
- frankreyes 3 years ago
  
  https://www.vice.com/en/article/a33j5a/a-redditor-archived-n...
  Yes.
hayd 3 years ago

And used Github actions to do the (infinite loop?) compute.
housemusicfan 3 years ago

So basically this:
https://www.youtube.com/watch?v=1kzb6uf0U0k
- lucb1e 3 years ago
  
  Video showing Simpsons going to an 'all you can eat' and staying until after closing time still gobbling down food without end, owner has homer thrown out.
- pragmatick 3 years ago
  
  This is the most blatant case of false advertising since my suit against the movie The Neverending Story.
- aleph_minus_one 3 years ago
  
  See also
  https://www.youtube.com/watch?v=Q6g8x0CPl2A

ranting-moth 3 years ago

More correct title would be "GitHub stopped abuse after 22M commits".

There is absolutely nothing wrong with GH stopping that and it's very wrong to insinuate otherwise like OP is doing.

Wouldn't be surprised if GH would permaban him.

eyelidlessness 3 years ago

I don’t think the author is trying to insinuate that GitHub is in the wrong in any way. They explicitly say they understand the decision, and anticipated that it would happen.
I don’t want to quibble with the term “abuse”, because I think in this scenario it depends on whether intent is a factor and whether we should trust their stated intent. But depending on how you look at it, GitHub would be just as likely to benefit from hiring the author as they would from banning.
- ranting-moth 3 years ago
  
  Load testing someone else's system resulting in a manual staff intervention due to potential system destabilization at 6am?
  It's a wordplay to call that anything else than abuse.
  - marvin 3 years ago
    
    I don't know what happened behind the scenes. But I thought it was amusing when one of our mutual fund customers tried purchasing one share of each of the ~1000 funds we offered in our catalog. It obviously broke things; there were various UI things we hadn't tested for that kind of portfolio.
  - eyelidlessness 3 years ago
    
    Okay, but I explicitly declined to quibble with the term. But I’ll go one further: I’ll concede the action qualifies as abuse. If there’s any quibbling worth quibbling, it’s between whether the author’s figurative abuse-hat was white or grey.
- adamckay 3 years ago
  
  > GitHub would be just as likely to benefit from hiring the author as they would from banning
  For what purpose?
  Creating an infinite loop that updates a file and commits it is hardly worthy of a job offer.
  - eyelidlessness 3 years ago
    
    They seemed fairly surprised by the fact it happened, and let it go on for some time. Which strongly suggests they hadn’t considered such a load test on their own. If I had a budget/head count, I’d at minimum put out a feeler for a QA role.
    
    adamckay 3 years ago
    
    But the next largest legitimate repo is going to be something like 20M commits shy of that, so it seems like it's excellent that GH engineers only just started to care.
    It's entirely possible that such a load test has been considered, but deemed non-realistic so not prioritised for some time. If I were running the QA team I'd be annoyed if time were spent on abusive destructive testing than realistic testing that real-world users may experience, especially because load testing like this would have to be on an identical environment to PROD so rather expensive.
    It reminds me of that old QA joke:
    A QA engineer walks into a bar and orders a beer. She orders 2 beers.
    She orders 0 beers.
    She orders -1 beers.
    She orders a lizard.
    She orders a NULLPTR.
    She tries to leave without paying.
    Satisfied, she declares the bar ready for business. The first customer comes in an orders a beer. They finish their drink, and then ask where the bathroom is.
    The bar explodes.
    
    blowski 3 years ago
    
    In your joke, QA was only doing exploratory testing. Somebody - perhaps the builders, bar staff, or QA - should have also been doing integration testing for key user stories, and the system has failed because nobody ensured that was happening.
    GitHub hasn't failed here - it continued to perform at normal levels for other users, so far as I can see, and they had an upstream process which caught the issue without the system failing. Maybe some exploratory testing had previously identified where that process should kick in, but without having an automated process since it was so unlikely to happen.
    
    rewmie 3 years ago
    
    > Which strongly suggests they hadn’t considered such a load test on their own.
    Not really. GitHub has been around for over a decade. People bother with problems that have a realistic chance of happening. If GitHub didn't bothered to rate limit commits it means it was a potential issue that didn't manifested itself for over a decade.
    People tend to bother about problems that happen. Otherwise everyone would be freaking out because of killer asteroids.
    
    sverhagen 3 years ago
    
    Where are you reading that they're surprised?
    
    eyelidlessness 3 years ago
    
    They asked with more than passable benefit of the doubt what the user intended. And they asked quite a ways after the user noticed local degradation. “Surprised” might be the wrong term, but it definitely doesn’t seem like a specific guard was in place for the scenario.
    
    rewmie 3 years ago
    
    Was there a guard needed? I don't think so. It seems GitHub didn't saw any degraded performance and barely noticed the issue, and odds are they presumed the author screwed up with their GitHub actions configuration. Once they determined it was plain old abuse, I'd guess some GitHub employee said "what a moron" and proceeded with his day.
    
    eyelidlessness 3 years ago
    
    What are you basing this on?
    
    rewmie 3 years ago
    
    > What are you basing this on?
    To start off, based on the fact that GitHub is around for over a decade and this was the first time this sort of attention-seeking stunt was made public.
    Do you have any indication this sort of stunt is relevant?
- apetresc 3 years ago
  
  Hire them? Why? There’s nothing technically clever or novel here. Anyone can create a shell script to generate random commits and push them. I’d bet even GPT-3.5 could handle that. Why should GitHub hire them?
  - eyelidlessness 3 years ago
    
    > There’s nothing technically clever or novel here.
    Nothing technically novel. But evidently it was at least a somewhat novel stress test execution for GitHub’s live systems, otherwise surely it would have been dealt with sooner and messaged with less benefit of the doubt to the user.
    Investigating the limitations of something doesn’t have to be novel to be interesting. It’s been a while (I think), but for example there’s been plenty of praise here for Netflix’s own stress tests of its live systems. The tests are often really mundane, eg just shutting some stuff off or triggering known error conditions. It’s interesting not because the nature of the fault is novel, but because systems are complex and it’s a way to learn about their failure modes.
    I’m also not saying GitHub should hire them (and kinda I doubt they’d want a QA offer based on reading other blog posts on their site). Just that a hire would plausibly be similarly beneficial to a ban.
    
    rewmie 3 years ago
    
    > Nothing technically novel. But evidently it was at least a somewhat novel stress test execution for GitHub’s live systems, otherwise surely it would have been dealt with sooner and messaged with less benefit of the doubt to the user.
    Not really. This is boring stuff, and odds are they never bothered with it because a) it has no impact on operations, b) the blast radius of this doesn't go beyond the attacker's own repo, c) no moron with time to kill bothered attempting this stunt until now.
    Probably now some low-level employee at GitHub needs to add a metric and an alarm to react to rate limits to prevent moron copycats from pulling this stunt for attention-seeking.
    Not smart, not clever. Just boring vandalism.
  - sergiotapia 3 years ago
    
    Right? Let me just write 22 million comments of random garbage on every HN thread, YC will surely hire me for that!
    
    eyelidlessness 3 years ago
    
    A more apt analogy, if we didn’t already know the general range where a given single comment thread degrades that thread (analogue to a local repo) and HN overall (analogue to the GH service), writing a bot to answer that specific question. It’s kind of wild that this yielded any new/not-widely-known information at all because it’s such an obvious thing to test. But apparently it raised at least some eyebrows on both fronts.
    
    sverhagen 3 years ago
    
    In load testing there's a difference between testing if something executes as specified versus testing where it breaks. I'm pretty sure that GitHub has tests to validate their performance specification. They may have tests even far in excess of that. They may not have tested where it breaks. They may have had a discussion like: what if someone tries such and such, and their answer may have been: we have good monitoring, we'll catch it before it gets out of hand, and lock out the user in question (which ultimately is what they did). In other words, spending engineering resources to determine the breaking point may not have been a priority. I'm not saying that I would agree with that in all circumstances, but it's their determination to make.
    
    eyelidlessness 3 years ago
    
    Sure, that all makes sense. It’s still true that someone stressing git and GH’s services in this particular way produced information that wasn’t especially redundant. Monitoring was good at catching it, but probably based more on service quality than on the actual thing under stress. Now there’s some data about the thing under stress, and if nothing else that allows some knob turns to calibrate monitoring. And if nothing else, that would more readily catch someone doing the same with nefarious purposes.
    
    intelVISA 3 years ago
    
    Alarmingly close to the secret criteria, Paul - he's the one.
  - classified 3 years ago
    
    Everyone can make a website and use a database. Did everyone invent AirBnB?
    Everyone can drive a car. Did everyone invent Uber?
    You're underrating the non-technical factors.
- femto113 3 years ago
  
  Deliberately trying to create an extreme situation in order to find when/where/how a service breaks is inarguably "abuse" regardless of whether the intent was malign.
  - eyelidlessness 3 years ago
    
    I addressed this in another downthread reply. Briefly, I agree. My quibble isn’t with the term “abuse”, only the nature of its intent.
    
    akerl_ 3 years ago
    
    The intent doesn't seem particularly relevant.
    
    eyelidlessness 3 years ago
    
    Sure it does. Why would we even be discussing “GitHub predictably enforced rules in response to actions with no conceivable merit”?
- rat9988 3 years ago
  
  It is malicious as he knows he will harm the service to be able to draw whatever conclusion. This is not a case where the end justifies the means.
  - eyelidlessness 3 years ago
    
    The first time I wrote and shared any kind of interactive code, it took approximately five minutes for someone to XSS it. At the time, I was pretty miffed too. After a polite explanation that the “abuse” was curiosity about defensive measures I’d taken, I understood pretty suddenly that there was a whole scope of programming I hadn’t even considered.
    More than 20 years later, I still remember the enormous benefit that little bit of malice has bestowed on me and my career. And every time I’ve been on the receiving end of such an exploratory exploit since has been exponentially more appreciated.
    
    eyelidlessness 3 years ago
    
    I’ll add one more anecdote while I’m at it.
    At a previous job I was aware of a potential vulnerability, voiced it rather loudly, but had a hard time getting the attention it deserved until I recognized it happened to coincide with a really high profile business-critical bug. I only recognized it because some jerks had previously fucked with much less important stuff under my purview, and I wanted very much to understand how they did it, and learned quite a bit by wanting to know.
    I used those developed instincts to unfuck what would have otherwise resulted in at least contract terminations, if not lawsuits. And the recognition allowed me to correct almost every compromised datum, which also guarded every contractee from challenges to their license status and ultimately whether they could be subject to wholly different jurisdictional context.
    I’m not going to disclose the nature of the vulnerability but the way the bug presented was time deltas based on time zone configuration. Hardly a novel problem, but nearly put a whole industry into peril and or conflict. Definitely was worth the attention.
    And when communicating the problem suffered, I did what any self respecting hacker would do: I exploited the damn thing myself and showed how it was done.
  - layer8 3 years ago
    
    I don’t think GitHub has been harmed. GitHub did the right thing by having mechanisms to disable repositories before they can cause real harm. The author merely tested out where that to-be-expected limit would be.
    Arguably, it would be better if GitHub documented an explicit number of supported commits, so that one can know beforehand which usage scenarios the service is suitable for.
    
    rewmie 3 years ago
    
    > Arguably, it would be better if GitHub documented an explicit number of supported commits, so that one can know beforehand which usage scenarios the service is suitable for.
    I don't agree. Clearly GitHub can easily handle this number of commits, and more. There was no real world limit being hit. There is no user impact or degraded performance.
    This means that in practice there is absolutely no practical limit in GitHub.
    Why document that? Are you planning on working on pushing more than 22 million commits into a project? And if you are, what stops you from sending an email to GitHub to clarify if it supports your extraordinary usecase?
    It seems some people around here are desperate to find any flaw in the way GitHub handled this case of vandalismz and at best are grasping at straws.
    
    rat9988 3 years ago
    
    And what if github didn't have such mechanism? They should swallow the loss as they should have known better? And more importantly, as they weren't documented, how could the author be certain there is a safeguard and he won't cause any harm?
    
    eyelidlessness 3 years ago
    
    You are almost getting the value of this kind of curiosity!
    
    rat9988 3 years ago
    
    Now, imagine if you had harmless alternatives. Maybe it will help you expand your thinking.

mabbo 3 years ago

The author used up so much of github's resources that it impacted other users. 22 million commits is probably enough that something started to hit a linear or n-log-n scaling function, setting off an alarm on some metric. Yeah, you get in trouble for that.

I'm reminded of a time in high school where my friend almost got himself banned from the school computers.

At home he had dial-up internet (it was 2003 and he lived in a very rural area). But at school he had megabits of bandwidth he could (ab)use. So he started pirating everything on the internet using a computer nobody ever used in a side-room of the library. It ran 24/7 downloading his long list of desires: games, movies, tv series, etc. He stored his spoils on his network drive, which had no limits on how much it could hold (until he got caught). He'd occasionally bring in a hard drive, copy everything that fit on it and bring it home with him on the school bus.

But all good things must end.

The network admin for the school board eventually came by and sat him down. He showed my friend a pie chart where, as he described it to me, "my name was on the portion that took up more than 2/3 of the pie". After a conversation, all the data got deleted, my friend got a stern warning, and somehow didn't get into any worse trouble than that.

Panzer04 3 years ago

"somehow"
I don't get this attitude. Shit happens, we talk about it, we don't do it again. Not everything needs to have dire consequences.
- flutas 3 years ago
  
  I think he means "somehow" in the meaning of "somehow, none of the copyright holders asked the school for his information."
layer8 3 years ago

> The author used up so much of github's resources that it impacted other users.
Note that the message only said “the potential to affect other users”. I would expect a professional service to catch such things before it actually affects other users.
justinclift 3 years ago

Sounds like the network admin and surrounding people had their heads screwed on properly. :)

GMoromisato 3 years ago

A long time ago, the math column in Scientific American decided to run a contest. It asked readers to send a post card with the biggest number they could think of. Whoever came up with the biggest number would win $1 million--divided by the winning number.

The editor of the magazine almost stopped the contest because he worried that someone might actually win real money and the magazine would be on the hook. But the author reassured him: human nature being what it is, the winning number is going to be not only larger than 1 million, but much larger than you can imagine.

And so it was. The winning number was (IIRC) some tower of exponentials that would take most of the universe to write out as decimal digits. The SciAm budget was safe.

If readers had coordinated somehow, they could have won a million dollars from SciAm and divided it among themselves. They might have made a hundred dollars each. But the author knew that such coordination would be impossible. Human nature would not allow it. Someone, somewhere, was going to send in a ridiculously large number to win. Classic Prisoner's Dilemma.

The GitHub case is the same. Human nature being what it is, someone, somewhere is always going to try to push the limits. As the developer of a SaaS development platform, this is something I'm taking to heart.

LouisSayers 3 years ago

The biggest number I can think of is 0.001 :D
They could have been in quite some trouble!
- pritambaral 3 years ago
  
  In the famous words of Calvin Coolidge, "you lose".
  https://clintonwhitehouse3.archives.gov/WH/glimpse/president...

throwuxiytayq 3 years ago

This feels slightly malicious, but I can’t help but admire the curiosity that takes someone to actually see what happens if. That said, now we know, so nobody else needs to bother GitHub engineers by doing this again, hopefully.

lucb1e 3 years ago

Not like it would take a lot of code to check on push if the repository has more than 10k commits per day since its creation date or something, to stop such abuse. Doesn't thwart existing repositories with millions of commits (Linux is at ~2M) and gives time to formulate a long-term plan for what's allowed and what's paid or just disallowed.
So even if people were to try, I don't see that being a big bother. Not that it's not malicious to do this now
- thewataccount 3 years ago
  
  I don't think you can limit pure commit counts though, because you can push many commits/massive history changes in one go.
  Monorepo's in particular could be impacted
  - lucb1e 3 years ago
    
    Good point, perhaps (age_in_years+1)×1M would be a better limit. Anyone wanting to import more than 1M commits could get a paid tier or beg support. At any rate, not that hard to implement is what I would expect
    
    mwint 3 years ago
    
    Git commit timestamps are 100% fudgeable. You could implement this based on GitHub repository age, but the assumptions would break for imported repos.
    (Understanding we’re waaaay off in edge case territory here and this is all basically academic.)

blowski 3 years ago

I’m surprised at the reaction in these comments. Somebody curiously pushing the limits of a service to see what would happen is very much in the spirit of all hackers. Meanwhile, GitHub responded appropriately, and his write up agrees.

guraf 3 years ago

The reactions were predictable when you consider:
1. Some HN users might/could have been personally inconvenienced by OP's action and they prefer resenting him rather than GitHub for whatever reason
2. Many HN users get paid a lot to work on SaaS themselves, so seeing a peer (however big it is) get abused for (what appears to be) entertainment is terrifying to them
Panzer04 3 years ago

It's a bit odd how hostile everyone here is acting. Sure, it's a bit silly, but hardly worthy of the kind of vitriol directed towards his "abuse".
Mystery-Machine 3 years ago

Someone potentially taking the service down for everyone, you know, just out of curiosity. Which part of this curiosity you need GitHub for? I'm curious how well GitHub handles DDoS attacks, what's their limit. Let's DDoS and find out, it will be fun!
- SparkyMcUnicorn 3 years ago
  
  > Someone potentially taking the service down for everyone, you know, just out of curiosity.
  I think this is exactly why it's great, and it's basically turned into a GitHub advertisement. Either GitHub is simply unable to handle weird abuse methods and/or the abuse prevention is improved.
  As an enterprise, wouldn't it be a bit concerning if your git host was unable to function (or respond appropriately) when presented with a random script kiddie?
  This person didn't have bad intentions, but other people out there most definitely do.
  - Mystery-Machine 3 years ago
    
    GitHub is very much able to handle one person doing this. Doesn't matter if you had bad intentions or you were just ignorant to bad side effects.
- ryaneager 3 years ago
  
  You really think one lone repo could take down all of GitHub? If GitHub doesn’t have stops in place to prevent that then they honestly deserve it.
  - Mystery-Machine 3 years ago
    
    So doing a DoS attacks from a single machine is fine, because "your servers can handle that"? Really? Of course GitHub can handle this, but if the sole purpose is to see where's the limit, you're stressing our servers and wasting our resources for nothing. I'd ban you no questions asked. Go test perf/scalability issues on someone else's live site.
2h 3 years ago

because he is pushing the limits of a public service thats used by millions of people every day. the BEST CASE is basically what happened, GitHub finds out and disables the repo. worst case is he takes down the entire GitHub site and gets permanently banned.
dont fuck with shit I use.
- ryaneager 3 years ago
  
  Do you think GitHub’s architecture is so bad that one person can take it all down by committing to a single repo?

iBotPeaches 3 years ago

> I decided to see how many commits GitHub (and git) could take before acting kind of wonky. At ~19 million commits (and counting) to master: it’s wonky.

This just doesn't seem right to me. Why? Its obvious at some point you'll harm the service. If the goal was to test it, why not try locally with git.

xigency 3 years ago

A good lesson to learn - If you as a service owner aren’t testing the limits to the point of failure and enforcing sensible guardrails around that, then some random user eventually will.
TechBro8615 3 years ago

GitHub offers the service for free and doesn't publish or enforce any specific limit on number of commits. I see nothing wrong with a user pushing as many commits to it as possible. It's not his problem when to stop it.
This is also how I feel about the Tor project getting their knickers twisted over people who do research on the live network. If the network can't handle it, then it's not resilient to attack. Asking people nicely not to do stuff that degrades your product will not make the product suddenly anti-fragile.
- adamckay 3 years ago
  
  It's this kind of attitude that's why we can't have nice things, though.
  A service is offered for free, with no documented limits or restrictions, so you push the service to its breaking point... Just to see what happens?
  - TechBro8615 3 years ago
    
    Well, in the case of the Tor network its whole premise is that it's resilient to attack. So either it is or it isn't. If it's resilient but only as long as people treat it nicely, then it's not actually resilient. And anyone who can demonstrate that is doing a public service. It would be irresponsible to discover a flaw and not disclose it, or to continuously exploit it. But it's not irresponsible to look for the flaw in the first place.
    In the case of GitHub, it's owned by a nearly trillion dollar corporation. Nobody is hurting some mom and pop business here.
lucb1e 3 years ago

> why not try locally with git.
Because you can't. GitHub is not open source, you'd need to steal the source code to try it locally. This comment is for educational purposes only, not trying to give OP ideas!!1
But you're right in spirit of course. Would be more interesting to install Forgejo/Gitea, GitLab, GitWeb, gitolite, TortoiseGit, etc., test them on various limits, and write that up in a nice blog post for magic internet points.
- js2 3 years ago
  
  > "GitHub (and git)"
  The "(and git)" portion can of course be tested locally. What OP will find out is that there is no more inherent limit on the number of commits in a repo than there is an inherent limit in the number of nodes in a linked list.
  You can go on forever till you run out of disk space. Possibly repacking will eventually require more than available memory.
- guraf 3 years ago
  
  Testing git, which was a stated goal, could have been done locally.
  It's obvious that the author is lying about that part, he only wanted to push GitHub to its limit, but he did say git:
  > I decided to see how many commits GitHub (and git) could take before acting kind of wonky. At ~19 million commits (and counting) to master: it’s wonky.
- RexM 3 years ago
  
  git runs outside of GitHub, which is what the comment you responded to was saying.
  Test the behavior of git locally, without testing GitHub.
  - lucb1e 3 years ago
    
    I understood the comment, but that's not what OP was testing. They were doing the commits via merging pull requests. Git has no concept of a pull request and no HTTP API. From the post:
    > The GitHub API has periodic issues merging/creating PRs. (I use PRs since that is more reliable than keeping a local master up to date via pulling at this point).
    
    js2 3 years ago
    
    > Git has no concept of a pull request.
    You are confidently wrong. Git, including pull requests, was developed years before GitHub ever existed. GitHub borrowed the term from git. Pull requests originally (before GitHub) are requests sent via email that one developer pull changes from another.
    https://www.git-scm.com/docs/git-request-pull
    The request pull command has been part of git since 2005:
    https://github.com/git/git/blob/master/git-request-pull.sh
    GitHub launched in 2008.
    > and no HTTP API
    Also wrong:
    https://git-scm.com/book/en/v2/Git-on-the-Server-Smart-HTTP
    There is nothing GitHub does with respect to git that you cannot do locally.
    
    lucb1e 3 years ago
    
    I'm not saying that you need GitHub for things like including parts of other repositories, but rather that the way GitHub implemented it is not code included in the git that you apt install.
    I didn't know of the specific "request-pull" subcommand so thanks for that link. Still, both things you link are a bit different from how GitHub implements it, and I'd be very surprised if the HTTP API you link includes an endpoint for triggering the request-pull the way that GitHub has such APIs for their pull request mechanism.
    If you meant to say that git can do anything GitHub can and we needn't use GitHub, I agree. I've used git in peer-to-peer fashion before, and especially now that it's Microsoft's, I think twice before opening repositories there. But if your main point was rather that git includes the same functionality as GitHub and that OP could have just tested the regular git instead of doing it on GitHub itself, I still think that's a rather different test target.
    
    js2 3 years ago
    
    Just to make sure we're not talking past each other: OP wanted to test both "GitHub (and git)". OP could have tested the git portion locally.
    But to engage you about the GitHub part: I believe that under the covers, GitHub is still using something substantially similar to git as the repo storage format. Git has no inherit limitations on number of commits. Eventually you run out of disk space, and possibly memory and/or CPU during repacking. You could turn off GC and let the repo remain unpacked. You might eventually run out of inodes. During cloning (and pulling), git implicitly creates pack files, so a clone/pull will also take a long time (CPU and or memory again) on an unpacked repo. This is why git periodically repacks.
    If I had to guess, GitHub also has no inherit limits. Creating commits was probably periodically repacking on the git backend, consuming increasing amounts of resources.
    I would be surprised if the GitHub API (the Ruby on Rails code) takes much resources at all.
    Creating endless PRs is something you can simulate locally with two copies of a repo. You can use "git ls-remote" against a GitHub-hosted repo with PRs in it to see how it exposes PRs as references that are not normally cloned.
    Regardless, I think that OP could and should have satisfied their curiosity about how git works locally, especially with respect to whether it has an inherent limits. And they could have satisfied their request about GitHub resource limits with a support request.
- ronsor 3 years ago
  
  You can download GitHub Enterprise Server for free.
layer8 3 years ago

> Its obvious at some point you'll harm the service.
That’s not obvious at all. One would expect a professional service to have limits in place to prevent any negative impacts.

BeefySwain 3 years ago

Sidestepping all of the ethical questions of embarking on this "research", I'm surprised the number was that low.

Linux[0] itself has about 1.2 million commits, so apparently Linux is within an order of magnitude of bringing GitHub to it's knees?

[0] https://github.com/torvalds/linux

eddythompson80 3 years ago

Microsoft’s azure docs repo has 1.1M commits, and it’s many gigabytes big. I made the mistake of trying to clone it to fix an issue in the docs I ran into. Ended up just editing it on GitHub because fuck that.

https://github.com/MicrosoftDocs/azure-docs

vinyl7 3 years ago

You can clone a few latest commits

  git clone -–depth [depth] [remote-url]

2h 3 years ago

I dont think that works:

    > git clone --depth 1 https://github.com/MicrosoftDocs/azure-docs
    Cloning into 'azure-docs'...
    remote: Enumerating objects: 107158, done.
    remote: Counting objects: 100% (107158/107158), done.
    remote: Compressing objects: 100% (101843/101843), done.
    Receiving objects:  17% (18217/107158), 780.25 MiB | 43.72 MiB/s

metabagel 3 years ago

I think it’s a rate issue, not the number of commits.
aloer 3 years ago

iirc remember some years ago the homebrew repo caused too much load due to their architecture where every client would pull on install or update. Or something like that.
Part of the GitHub response afaik included the info that they went as far as they could with dedicated and beefier servers but asked for a software fix.
I would think that if GitHub anticipates a normal repo growing this large they can give it the special treatment
- jwilk 3 years ago
  
  https://github.com/orgs/Homebrew/discussions/226
tikhonj 3 years ago

There's a rough rule of thumb that you should expect to redesign your system to handle each order of magnitude increase in scale, and I figure it applies here too—gracefully handling that size of repo would require substantial engineering work, and they have plenty of time to handle it before human-oriented open source repos get even close to the current limit.
- lucb1e 3 years ago
  
  I'm not sure redesigns were necessary between going 1 to 10, from 10 to 100, from 100 to 1000, from 1000 to 10'000, from 10'000 to 100'000, or from 100'000 to 1000'000 which we're now at. It sounds like a sensible engineering rule, but I'm not sure it translates to software, or at least not in this case. I don't know of any design changes made to Git since it was first created, there's no v1 and v2 repositories for example.
  - spiralx 3 years ago
    
    It depends on how quickly you pass through each order of magnitude milestone. I remember reading about how MySpace grew something like five orders of magnitude in less than a year, and no matter how scalable your architecture is you're going to hit a point during that where you need to rearchitect your whole system.
    Slower growth allows for forward planning and incremental architectural changes.
  - aloer 3 years ago
    
    > there's no v1 and v2 repositories for example
    We wouldn’t know. GitHub is probably running something very different to normal local git including optimizations for performance and cost.
    They must only ensure API/protocol compatibility and could have already replaced everything else many times over.
- aleph_minus_one 3 years ago
  
  > There's a rough rule of thumb that you should expect to redesign your system to handle each order of magnitude increase in scale
  I rather know the rule: by good engineering, you can modify a system to handle a one magnitude increase with respect what it was designed for. As soon as a two magnitude increase can occur, you better redesign the system.

Kwpolska 3 years ago

> I’ve also asked if they can re-enable it so I can give one more commit to say the final results on the readme then (public) archive it.

Entitled much? The author should be happy GitHub didn't just ban them for violating the ToS and intentionally trying to break things.

eyelidlessness 3 years ago

They asked. They didn’t demand, and they seem prepared to accept whatever GitHub decides. If I were fielding that request, I’d certainly grant it—on the condition that any deviation from the stated intent would indeed result in a ban—purely on the basis that it’s a ~free QA contribution and postmortem.
- Kwpolska 3 years ago
  
  Keeping the repository, even as a public archive, would still require a lot of resources on GitHub's side. The only fair thing to do here would be to apologize and ask for the repo to be deleted.
  - rafark 3 years ago
    
    And could be seen as a reward or an encouragement for other people to abuse the service.
    
    eyelidlessness 3 years ago
    
    They already do incentivize white hat exploit efforts[1]. The author seems to have run afoul of one of their rules[2] by impacting other users, but I don’t think that impact could be knowable without trying.
    GitHub could trivially honor the request without changing the incentives or even taking any defensive implementation action, by specifically citing this experiment in the rules and maybe adding some more specific wording to the TOS.
    1: https://bounty.github.com/
    2: https://bounty.github.com/#rules

medellin 3 years ago

The mindset around programming and exploration in general is in a sad state. I don’t understand why we have so much hate here for things like this. Better that someone like this find it then someone who noticies it and spins up 1000s of repos to do the exact same thing.

I think the sentiment here shows the current state that software engineering has devolved into. It’s a 9-5 where you put in minimal work and get mad when someone breaks your system because you might have to do an hour of work to fix it on your weekend.

Arainach 3 years ago

"Devolved" implies a negative connotation, but this is a positive evolution.

flusensieb 3 years ago

https://web.archive.org/web/20230702211636/https://programmi...

antimora 3 years ago

Here is another GH abuser I found recently: https://github.com/eemailme

This account basically subscribes to thousands of repositories and monitors all activities. I am suspecting this account is harvesting user activities. I am not sure why GitHub allows this type of data harvesting.

kjs3 3 years ago

When I was an junior admin in college, there was always at least one kid a trimester who 'experimented' with a fork-bomb on one of the shared Unix servers, and was shocked to learn that there are things you can do that you really shouldn't do. Same thing.

stiwari 3 years ago

I'm surprised that a lot of users here are telling OP that he was wrong. OP was well within his rights to do this, as his intention was to stop when any impact is observed, not continue with it. It is within their rights to test the system they want to use to make sure their requirements are met.

To be honest, this is why companies also should not discourage this. Imagine if a malicious group did it with multiple users at the same time. At least now they will have pro active alarms for it.

TacticalCoder 3 years ago

Did the dude bring GH down at some point?

chris_wot 3 years ago

So basically, this guy is trying a DoS of GitHub. To hell with him.

Settings

Disabled at 22 million commits

Keyboard Shortcuts