We need a new generation of source control

75 points by ariehkovler 7 years ago · 89 comments

Reader

hliyan 7 years ago

Are we mistaking a dependency control problem as a revision control problem?

In a previous life, before microservices, CI/CD etc. existed, we did just fine with 20-30 CVS repositories, each representing a separate component (a running process) in a very large distributed system.

The only difference was that we did not have to marshal a large number of 3rd party dependencies that were constantly undergoing version changes. We basically relied on C++, the standard template library and a tightly version controlled set of internal libraries with a single stable version shared across the entire org. The whole system would have been between 750,000 - 1,000,000 lines of code (libraries included).

I'm not saying that that's the right approach. But it's mind boggling for me that we can't solve this problem easily anymore.

organsnyder 7 years ago

My preferred approach for a microservice architecture:
- Contract-first API development
- All API contract definition files (OpenAPI/Swagger, .proto, .wsdl...) in a single repo, which has a CICD pipeline to bundle them into artifacts for various platforms (Maven, Nuget, NPM, gem...)
- Consumers and producers import the "api-contracts" dependency; this is the only coupling between components
- Consumers and producers both generate necessary code (server stubs, client libraries) at build time
IMHO, if your service clients have dependencies on implementations of APIs rather than just the definitions, you're not realizing the key benefit of microservices (or SOA).
- bluGill 7 years ago
  
  I agree with your last point in theory, but in practice consumers start to rely on bugs and implementation details, and eventually it is easier to change the contract that fix the clients.
  - joshuamorton 7 years ago
    
    Hyrum's law and all.
    But I don't think that's what he was saying so much as your clients shouldn't depend on the server code, only the api definition. Which is true and possible in general.
    
    organsnyder 7 years ago
    
    That is what I was saying; but [s]he's right that implementation details always seep in by way of assumptions that clients make. It's extremely expensive to write an API definition that encompasses every possible edge case—probably only feasible in certain life/money-critical applications.
  - organsnyder 7 years ago
    
    Yeah, that is always a danger. From a purist standpoint, I'd argue that any behavior not defined explicitly in the API contract is subject to change at any time, and clients relying on it are by definition buggy. But I recognize that that often doesn't matter when the client code is owned by a team under a director with more clout than yours, a valuable customer, etc.
    One possible solution would be to bump the major version (assuming semver) of the API contract, and support multiple versions of the API simultaneously. Of course, that has its own challenges and costs.
TheGRS 7 years ago

I dunno, the only 3rd party over-dependencies I see is in frontend code and that code is usually in one big repo (in my personal experience). I think the proliferation of NPM dependencies is its own problem, but usually when I'm thinking of mono-repo vs multi-repo its because teams/repos are having trouble coordinating between each other, not because some NPM library hasn't been updated lately.
zaphar 7 years ago

Most of the mono-repo advocates aren't talking about dependency management. They are talking about the advantages around continuous integration that a good mono-repo tool can bring.
The article actually complains less about mono-repos and more about mono-repos on Git and the associated tooling around Git.
- geezerjay 7 years ago
  
  > Most of the mono-repo advocates aren't talking about dependency management.
  The article however states dependency managements as the main complain, to the point that it's mentioned immediately after the first point where monorepos are mentioned.
  - zaphar 7 years ago
    
    Yeah, I found the article to be more about the deficiencies of Git for things version control isn't meant to solve and less about the problems with mono-repos themselves.
brightball 7 years ago

You’re not wrong. Part of it is the willingness of people to reach for a dependency that amounts to a few lines of code to avoid.
It would be nice if there was a tool that could help you identify just how much of each dependency you actually depend on so you could trim it.
- joshuamorton 7 years ago
  
  These things all exist if you use something like bazel/pants/buck to manage your dependencies. When you can construct a DAG of the entire dependency structure you can see exactly how much you depend on any given thing (and get fun dot-graphs of it!). But that requires being precise with dependency declaration in a way that a lot of people don't want to be.
  - geezerjay 7 years ago
    
    > But that requires being precise with dependency declaration in a way that a lot of people don't want to be.
    Some programming language stacks already fix that problem in a transparent way. Take Microsoft's .NET Core+Nuget stack. Developers can add packages to a project without specifying a version number (implicitly it's the latest release) and dependencies are checked when all dependencies are restored.
    IIRC Rust's cargo also follow a similar approach, and so do npm and yarn. So, that's pretty much standard at this point.
- hliyan 7 years ago
  
  Is there something that is akin to development-time tree-shaking (as opposed to build time)? i.e. you pull a copy of the specific library functions directly into your source?
  - tatersolid 7 years ago
    
    This is called “vendoring” your dependencies (taking a snapshot into your SCM), and has been common practice for about 30 years. Long before NPM and other language-specific package managers.
    Tools for managing vendor branches or sub-trees abound, but good old svn:external and scripts work for most use cases.
- maerF0x0 7 years ago
  
  one of the go proverbs is a little copying is better than a little dependency. [1]
  Also go vendoring tools usually trim the repos down to the packages you import.
  [1]: https://go-proverbs.github.io/
sebazzz 7 years ago

Monorepo are used because of internal dependencies, but there are already very good solutions for that. We have as an org a lot of projects (50+) but also split out common functionality (as it makes sense) into components which are shared between projects. How do we share those? In our case (.NET) we have an internal NuGet source which contains the components in question. Each project can upgrade to the later version of an component at its own schedule, just like 3rd party dependencies are updated when necessary.
It does not have to be complicated.
- WorldMaker 7 years ago
  
  The article also points to the issue of multiple repository management, and even includes links to three possible options to solve it.
  The questions around "which repositories do I need?" and "how do I update all of them?" and "how do I make an atomic transaction [commit, branch, PR] across all of them?" are interesting questions in a multi-repo situation, but there are plenty of possible answers as well.
  Some of them are just social in nature (read the README, watch/follow the whole GitHub organization, etc), so they aren't are as interesting technically as monorepo or "meta-repo" tools.

jrockway 7 years ago

The source control system is not the piece of the equation that matters to most people. The build system is the important part. That's what prevents you from rebuilding the repository when you only change one Kubernetes config file, or what causes 100 docker images to be built because you changed a file in libc.

I think the tooling around this is fairly limited right now. I feel that most people are hoping docker caches stuff intelligently, which it doesn't. People should probably be using Bazel, but language support is hit-or-miss and it's very complicated. (It's aggravated by the fact that every language now considers itself responsible for building its own code. go "just works", which is great, but it's hard to translate that local caching to something that can be spread among multiple build workers. Bazel attempts to make all that work, but it basically has to start from scratch, which is unfortunate. It also means that you can't just start using some crazy new language unless you want to now support it in the build system. We all hate Makefiles, but the whole "foo.c becomes foo.o" model was much more straightforward than what languages do today.)

arianvanp 7 years ago

The argument of why monorepos suck seems to largely rely on "CI Sucks" in this article. But I beg to differ. Monorepos only work in combination with a build system that tracks dependencies carefully.

I contribute a lot to Nixpkgs, which is a monorepo with almost 50000 subcomponents [1], but because the build tool and CI track changes through hashes, changing a package only triggers rebuilds of other packages that depend on it and builds are super quick. It accomplishes this by heavily caching previous builds and sharing those between all builders.

No, monorepos are not going to work with a CI and build tool that always builds everything from scratch and does no caching. Instead, you should pick the right tool for the job, and go with a build system like Nix, Buck, Bazel or Please which were designed with monorepos in mind.

I think the second point the author makes, but only very briefly, is way more important to look at. Is git itself up to the job for such large repositories? One problem I've started running into in nixpkgs is that `git blame` takes considerable time to even execute, due to the enormous volume of commits in the repository. I would love to see a version control system that is optimised for storing lots of loosely connected components, and has better support for partial checkouts. I haven't found it yet, and I would love to hear what others are using for this.

I hear facebook has some modification of mercucurial. And Google probably created something themselves in-house. But is there anything open-source that supports these workflows at scale?

[1] https://repology.org/repository/nix_unstable

ljm 7 years ago

I agree with the first part of this. If by CI you mean something like Circle or Google Cloud Build or Travis, then your CI is pretty much limited to whatever you can fit in a YAML file, and what the CI service will support in that.
YAML in and of itself is not the easiest thing to parse when you have multiple layers of nesting and a lot of lines.
I don't really want to see what a CircleCI config would look like for Nixpkgs.
Once you get to the point of scaling your CI you're looking at tailored infrastructure to make sure you're only building what needs to be built.
- arianvanp 7 years ago
  
  I'm honestly surprised that Google Cloud doesn't offer a "CloudBazel" product!
  - jsty 7 years ago
    
    Something along those lines seems to be in the works:
    https://blog.bazel.build/2018/10/05/remote-build-execution.h...
    
    jingwen 7 years ago
    
    More information on how to get access to Remote Build Execution for Bazel on GCP: https://docs.bazel.build/versions/master/remote-execution.ht...
    (Disclaimer: I'm a engineer on the Bazel team)
qznc 7 years ago

Subversion.
Also proprietary systems like PlasticSCM.

mikece 7 years ago

I like the idea of creating a source control protocol that can be implemented with any number of tools rather than having wars over particular implementations of source control products.

(And would Git really have beaten Mercurial if GitHub had been HgHub instead? GitHub's success was more about process than the technology of Git, IMO.)

weberc2 7 years ago

> And would Git really have beaten Mercurial if GitHub had been HgHub instead? GitHub's success was more about process than the technology of Git, IMO.
Hg is a much better user experience than git, that's for sure. Git won because of Github, which may have beaten any HgHub simply because Git has an actual API while Mercurial's "API" is "use subprocessing". In other words, if Mercurial gave a damn about the developer experience earlier on, it might well have won the war.
- masklinn 7 years ago
  
  > which may have beaten any HgHub simply because Git has an actual API
  Git doesn't though. A bunch of shell scripts calling shell scripts calling a few native binaries is pretty much "use subprocessing". libgit came much later, it wasn't part of the original git.
  However what git did provide was an open, stable, fairly simple and officially supported physical model with which you could easily interact directly, and protocols which either worked on that (file and "dumb http") or a relatively simple exchange protocol (the "pack protocol" https://github.com/git/git/blob/9b011b2fe5379f76c53f20b964d7...).
  Hell, if anything hg's always provided more API than git, the extension model wouldn't be possible without it e.g. stdout coloration could be an hg plugin while it had to be implemented in each git command.
  - weberc2 7 years ago
    
    It looks like you're right about the history. According to the git repo, libgit's first commit was in October of 2008 while Github was incorporated in early 2008 (according to Wikipedia).
    Github's popularity was probably due to Git's popularity in the Ruby community which may have been due to the official support of the physical model and simple protocols.
    That said, an "officially supported physical model" is an API even if I originally had libgit in mind. Also, none of this invalidates the broader point, which is that Git won because of Github, not because of user experience.
- bitwize 7 years ago
  
  Git won because Linus used it and the Ruby community picked it up. It had serious cachet before GitHub became a thing.
fartcannon 7 years ago

Feels like you have that in reverse. I'd say that Github is popular because of Git's popularity, not the other way around. And git's popular because it was birthed by Linus.
WorldMaker 7 years ago

> And would Git really have beaten Mercurial if GitHub had been HgHub instead?
Bitbucket for Mercurial launched about the exact same time as GitHub.

zdragnar 7 years ago

The title felt a bit misleading- this is more a gripe of git and the approaches to mono and multi repos with git.

I'm not following the call for something new, though:

> A source control that treats CI, CD, and releases as first-class citizens, rather than relying on the very useful add-ons provided by GitHub and its community.

I'm not a die-hard every-tool-should-do-exactly-one-thing-the-way-the-unix-gods-intended type of person, but in this case, I really feel that source control should stick to being source control. Hooks and add-ons are great precisely because things like CI and CD came after, and who knows what the new rage will be 5 or 10 years from now.

Building everything for today's workflow into a single tool means that by the time it's ready, the "today's" workflow won't be cool anymore, and we'll have other newer tools and processes that this new source control can't support :/

sanderjd 7 years ago

You can already adapt mercurial to do different stuff very much like the author suggests. As a user (but not developer) of such adaptations, I think it works really well.

rabi_penguin 7 years ago

I don't know if I really follow the conclusion from this blog post although I sympathize with the complaints. Let's take one point: " In fact, not only will Git CI tools rebuild and redeploy your entire repo, they are often built explicitly for multi-repo projects." This seems patently wrong. On buildkite, which we use, you can explicitly set up build steps to trigger based on directory patterns.

In my experience on teams at growing companies, I've seen pain points around continuous integration, configuration management, integration testing, dev/prod parity, feature flagging and releasing, provisioning staging servers in terms of pure tech/infra issues. Beyond that, I've seen more pressing general organizational issues around tech debt, software design collaboration, architectural debt, code review processes -- these are all pressing and valid concerns. But I just find the conclusions of this blog post flat out wrong. To conflate an unsatisfactory CI choice and configuration (which is totally reasonable) with a failure of version control is a pretty serious one. It doesn't fully disprove the thesis, but it certainly doesn't lend it support.

If you've installed a wheel onto a poorly set up suspension and get handling issues, does it mean you should reinvent that wheel, or does it mean you should check if your suspension may need some tuning?

apostacy 7 years ago

This is such a pain point for me.

I would LOVE something between subtrees and submodules.

I have explored this many times, and if I had the ability to write something like this, I would.

I would love it if I could have a child repo that did not require an external remote and could be bundled and stored within a parent repo, unlike a submodule. But I would also like it if it could be more decoupled from the history unlike a subtree.

I can get most of what I want from submodules and subtrees, but not really enough.

It might be possible without even having to change git. Perhaps if there were a way to have branch namespaces of some kind, and I could have a subtree have completely separate history, but have it checked out within the same working tree. Many of my projects that are submodules only make sense within their parent repo, and it is really redundant to have an external repo for them. But I also don't like to have to do expensive surgery to deal with subtrees, and it would be nice to not have it be completely merged.

My dream is to be able to drop a repo inside another repo and have git just treat it as if it were part of the parent repo. And then to be able to bundle the child repo to the parent and push it.

I know that it is mostly possible to do this already, but it is not easy or intuitive.

Rotareti 7 years ago

> My dream is to be able to drop a repo inside another repo and have git just treat it as if it were part of the parent repo. And then to be able to bundle the child repo to the parent and push it.
I'm not sure if I understand you right, but I think I made what you describe: https://github.com/feluxe/gitsub
It's a simple wrapper around git, that allows nested git repositories, with almost no overhead.
I use it for a private library (the parent-repo), which itself contains modules (the child-repos) that I open sourced on github. It works fine for my use case. I wrote it, because I found "submodule" and "subtree" too complicated. 'gitsub' is still in alpha.
- apostacy 7 years ago
  
  Thank you for sharing that! That's really cool!
  I'm just very attracted to the idea of bundling repos together. I frequently use git-annex and datalad, and try to keep binaries and helper scripts in different repositories.
jsmith45 7 years ago

Well git can most certainly have multiple separate history trees within one repository that are all technically unreleated.
One could even use the existing submodule feature to reference unrelated history in the same repository, but the submodule tooling would want create a seperate .git folder, and duplicate everything, and would want a url to identify the repo, rather than knowing to use the parent repo.
It ought to be possible to modify git submodule to be able to specify that the submodule is actually just all the refs in namespace "blah" of the physically containing repo. Basically you would only need an expanded version of the .git "symlink file" feature that lets you specifify both "gitdir" and a ref namespace to use for all operations. Then poof, you would have self-contained submodules.
You would still have the problem that namespaced refs do not get cloned by default.
You also have some risk of pushing some refs of the parent that have module entries that reference objects only in unpushed refs in a ref namespace, meaning that if somebody cloned the repo, and tried to expand the self-contained submodules, it would discover that the commits are not present. I'd not be surprised if regular submodules also had that limitation. (I've never really used them in a manner where i might modify and commit in the submodule.)
- namibj 7 years ago
  
  Last I used submodules, I recall hitting that issue. You need to push the submodule first or one can't checkout the parent repo's master on another machine.
brechtm 7 years ago

This sounds pretty much what I described in http://www.mos6581.org/git_subtree_alternative. This solution splits your changes between different branches at the time of commit, instead of afterwards like git-subtree does.
While I do present a proof-of-concept implementation using hooks, a proper implementation would require some changes to the git client, I imagine.

lucozade 7 years ago

I know it's wrong of me. Genuinely, I know. But when your second paragraph states that Google and Netflix pioneered horizontally scalable processes...

It makes it so hard to read the remainder untainted by a certain amount of scepticism.

Fortunately, he's not actually saying anything much in the article so I don't think my irrational reaction to ignorance will mean I've missed something important. But still...

hashkb 7 years ago

I'm not sure the article supports its thesis with anything concrete. It's the author's opinion that submodules and some scripts are inadequate, but in my experience, a variety of reliable and flexible developer experiences are possible.

Many devs barely scratch the surface of what git can do anyway. Onboarding them on a few extra scripts seems better than an entirely new scm tool.

booleandilemma 7 years ago

I’m just now getting comfortable with Git, please don’t do this to me.

intertextuality 7 years ago

Your opinions should not be swayed by one article. Have some resolve in what you do.
Git is great. It also has issues. Scaling has issues. Changing a tool won't solve scaling issues.
Anecdotally I think submodules work just fine, although the git submodule tool is not intuitive. Then again, I work in a very small team on small projects compared to these mammoths being discussed with monorepos and the like.
- majewsky 7 years ago
  
  > Your opinions should not be swayed by one article.
  Opinions should ideally not be swayed by the volume of text making arguments, but by the coherence and logic of those arguments.
  - intertextuality 7 years ago
    
    Sure. But in order to see all sides of the argument and make an informed decision, one should read at least a few different sources, no?

zamalek 7 years ago

> Take no prisoners: mono-repos suck too

We are transitioning to multi-repos because we have been burned so hard by our mono-repo. Builds used to take 3 hours on the monster and we managed to get them down to 30 minutes, but we are truly at the bottom of the barrel. God help us if there is a build failure, every subsequent build fails while we scramble to identify the problem (and we can only sample success or failure every 30 minutes). It's a house of cards and it's horrible.

> Shots fired: multi-repos suck

We've already had debugging woes with this combined with internal package feeds (you have to pull down the code, build it, remove the package and replace it with the local code), which has made us very bearish on code re-use. That rigmarole sucks way less than mono-repos.

> You can’t have your cake and Git it too

Combine version control and package managers IMO. Go does one half of this. If you work under GOPATH with all your code, you can easily jump across repos to make changes and have those changes immediately propagate to the initial repo. Your hard-disk becomes the mono-repo. What Go doesn't do is pull binaries down from package feeds. There needs to be some simple mechanism to switch between builds and code.

sanderjd 7 years ago

Isn't the argument here that Git is not good at mono-repos rather than that mono-repos suck? This seems true to me, but there are already other options that suck less if you want the advantages of a mono-repo.

I would also suggest that mono-repos work better with statically typed languages with module boundaries and visibility control. The problem of anything being able to touch anything else is not so bad when you can hide implementation details behind small APIs.

I have definitely felt some pain with having Ruby projects in a single repo using git, but much less so with Java projects using Hg.

dclowd9901 7 years ago

I don't really understand the problem with multi-repos. Maybe because I'm a FE developer, I'm shielded from some of the pain of launching an external service locally, but I find the process of needing to update an external module to be rather similar to open source dev: clone, link, fix, PR, approve, merge, update package.json. It's that simple. The only part that can present itself as somewhat tricky is how your environment handles linked dependencies, but that can be resolved if you've configured your webpack or whatever correctly.

joshjb17 7 years ago

The answer to your problem is Nix:

https://nixos.org/nix/

majewsky 7 years ago

Fun fact: "Nix" is a colloquial variant of the German word for "nothing". Whenever we talk about NixOS at our hackerspace, hilariousness ensues.

netheril96 7 years ago

Google uses a monorepo for most of its code and I find it a much better experience than what I’ve had in the past. But that good experience is predicated on a lot of Google only internal tools. If Google open sources enough of those such that people outside can have the same experience, maybe the debate will end decisively in favor of monorepo.

khazhou 7 years ago

What kinds of tools?
- grillermo 7 years ago
  
  You can read about all the tools in this article https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...
- ZeroCool2u 7 years ago
  
  Parent comment is likely referencing bazel[1], which is the external version name, among a number of other tools.
  [1]: https://bazel.build/

majewsky 7 years ago

> Each side of this debate classifies the other as zealous extremists (as only developers can!), but both of them miss the crux of the matter.

I take it the author has never had an interest in politics.

1023bytes 7 years ago

I think that Git submodules are to blame. I think that the idea is great, but the implementation is cumbersome. If they were better to use it would solve a lot of these problems.

jrockway 7 years ago

I think it's irrelevant. If your build depends on inputs from two repositories, it's the same level of complexity as having the inputs all come from one repository. You might have two .git directories, but if the code is coupled, it's just a monorepo in two directories.
Ultimately the problem is in scaling the number of build inputs, not the number of .git directories.

niftich 7 years ago

The field does suffer a bit from version control, dependency management, language compilers, and build and packaging tools all being single-purpose tools that are layered, where one can't introspect others beyond the public API, and manual effort or simplistic not-always-true assumptions have to be used to bridge information from one to the other.

It's tempting to imagine an integrated system where making changes to a piece of source code automatically commits every change, every commit will attempt to compile and build, every successful build auto-packages into a new artifact with a new build version. The language and the build system would ensure that all builds are reproducible. Because of this, all builds can be addressed by identity (content hash) too, not just a name and a build number within some namespace.

When any dependency of the current project has newer builds, one could choose to pull up an interactive diff experience to step through the code of newer versions. This would aid in selecting a different version on which to depend, if desired. If a different version of a dependency is picked up, a new build gets triggered too, and a successful build gets a new build version.

The strong linkage between source code revision and build version, the deterministic builds, and content-based artifact addressing work together to ease the traceability of changes and the reusability of artifacts, and sidesteps concerns about the hosting and namespacing of source code and build artifacts interfering with the project's "single source of truth", because any copy of an artifact, known by any name, irrelevant of its location, will share the same hash.

There will still be usability problems with such a system too. There would be no way to strip data out. A shelve, replay, and cherry-pick frontend would be necessary to allow the doctoring of input before it's committed permanently -- but in such a system, only permanently committed code can be built. The workflow to prepare a project for public consumption would be to author and test all the changes in a 'scratch' project that doesn't auto-disseminate its build artifacts elsewhere, and cherry-pick the changes into a public project. Public projects could only have public dependencies.

Configuration files, data files, and pieces making up a larger environment may need a different approach. Nonetheless, a lot of these problems take the same shape: some input should deterministically produce some output, and a running system may choose to alter its own state by interfacing with a stateful outside world (e.g. load or write files, communicate through a network). The sensible places of drawing a boundary between the inside world and outside world will differ for every use-case.

decebalus1 7 years ago

What a bullshit pointless article. And oh god the cringey git wordplay... I'm starting to feel this is basically blogspam.

kadendogthing 7 years ago

It IS blog spam. The article doesn't say anything besides "everything sucks." There is not a single constructive point being made besides some fantasy world where our tools are drop dead gorgeous and the build pipelines are well oiled and never have to be paid attention to.

kadendogthing 7 years ago

As I've stated in another post on here, what's the point of these articles? It just says everything sucks, but doesn't really dive into why or how we could possibly fix any issues they may directly point out. Also it kind of sounds like the author really doesn't have any idea what GitLab is or does, so maybe he should check it out.

But allow me to retort these bald assertions presented in the article:

Monorepos are great.

Multirepos are great.

Git is the best source control system ever. And if you think it could do something better, well have I got news for you. It's completely open source and extendable with various script entry points and an easily accessibly API.

Thanks for reading my blog.

klodolph 7 years ago

> Git is the best source control system ever.
To be clear, I'm not disagreeing. But it is simply not good enough. Any new generation of source control needs to be able to do things that are difficult with Git, and Git simply isn't extensible enough. Microsoft has a Git VFS, and there's Git LFS, but this just doesn't go far enough.
There are good technical reasons why you would use Perforce or even Subversion these days.
The people who made Git made it for working on large, but not huge, open-source code repositories with a traditional model. It doesn't work so well for vendoring, it doesn't work well for artists, it doesn't have locking, it doesn't have access controls (and there's only so much you can add). You can argue that these features don't make sense or we're using Git "wrong" or I can write a bunch of hooks but at some point I just want them to work and I'm tired of fighting with Git to make it happen.
Just personal background, these days I work with closed source and open source, monorepos and multirepos, Git, Subversion, and Perforce all on a regular basis (and sometimes use weird custom setups). Git is by far the most familiar of the three, and I've published some tools for Git repo surgery.
- hashkb 7 years ago
  
  > There are good technical reasons why you would use Perforce or even Subversion days.
  Can you say more? What are some of those reasons? Or link to some data or examples?
  - Tempest1981 7 years ago
    
    With a monorepo, how do you avoid the situation where almost every time you want to commit, you have to pull-and-rebase first? Because somebody has always pushed a change, every minute or two.
    
    majewsky 7 years ago
    
    In a repo that large, you don't want to have random people pushing to master anyway. Have people commit to branches, and then automation merges the approved branches into master. ("Automation" may be as simple as the "Merge" button in Github's UI, or more complex if necessary.)
  - Tempest1981 7 years ago
    
    With git, can you set specific user permissions by directory? We need a way to prevent pull or commit by certain users.
    Or require a review before committing to some projects/dirs, but not all.
  - tempguy9999 7 years ago
    
    Simplicity. I understood SVN immediately but I'm still struggling with Git.
    It's only one thing and perhaps the only one, but it's a huge one. IMO anyway.
  - emcq 7 years ago
    
    Partial checkouts is the main thing that comes to mind that is better with svn than git.
    
    falsedan 7 years ago
    
    git archive is ... ok, not pretty great, but it works
    
    klodolph 7 years ago
    
    That's not really a working copy, though. What some need is a tool that lets you check out a part of the repository as a working copy, without checking out the rest. By "part" we might mean more than one directory and its descendants (i.e. not a single root).
    
    falsedan 7 years ago
    
    hmm, you could frankenstein together a bunch of trees to make it look like a partial checkout, but you couldn't make a new commit without all the parent tree objects up to the root. This sounds just like a subtree to be honest.
    Are you frequently checking out a subdir of a repo and committing changes to it? Is it config?
    
    klodolph 7 years ago
    
    The particular case I'm thinking of involves a repository which is large enough that full checkouts are slow, so you do partial checkouts to make things faster.
    Maybe at some point there will be tools that let you do this with Git, maybe built on top of something like Git VFS. At the moment it just kind of sucks. Subversion and Perforce both handle it just fine.
- falsedan 7 years ago
  
  > it doesn't have access controls (and there's only so much you can add)
  so there's a lot of drawbacks to using gitolite but we were able to customise access controls down to allowing some users the ability to only change lines of checked-in config only to certain values
  - klodolph 7 years ago
    
    How do you prevent users from reading certain parts of the repository, though? This was what I meant by "there's only so much you can add"... you can reject pushes that change parts of the repo, but you can't prevent reads without breaking everything.
    
    falsedan 7 years ago
    
    > can't prevent reads without breaking everything
    I don't understand, you can lie to git-upload-pack and send anything you want to the user?
    but when we used gitolite, we put sensitive stuff in a separate server and restricted reads to trusted users/deployment tools
    edit oh I see, you want to let some people clone the repo but with some stuff redacted and still be able to make changes to the non-redacted stuff. I'd used LFS and move the ACLs to the file server, if using a single repo was a hard requirement
    
    klodolph 7 years ago
    
    > I'd used LFS and move the ACLs to the file server, if using a single repo was a hard requirement
    If you're putting a few large files in LFS, or maybe a couple sensitive files, I can understand and I'd say you're still using Git, but with some extensions.
    If you're putting an entire sensitive subtree in LFS, I don't think you're really using Git any more, in the sense that many of your standard Git workflows will have to be different.
- kadendogthing 7 years ago
  
  You have completely missed the point of my post. The point was my post had as much substantial points as the article in question, with less words obviously. Which should be obvious but never get in the way of a good tech contrarian article stating everything sucks I suppose.
  - klodolph 7 years ago
    
    Ah, thank you for the clarification, I had interpreted your comment far too generously.
- jessemillar 7 years ago
  
  Out of genuine curiosity, do you know of any resources you could point me to to explore specific situations where Git isn't fitting the bill technologically?
  - klodolph 7 years ago
    
    The big one is anything that requires locking. Git by nature doesn't support locking.
    The next one is repository size. Anything with extremely large history size or checkout size. Very hard to work with using Git. Microsoft has Git VFS. Facebook and Google have modified versions of Mercurial and Git.
    High repository velocity. If you are trying to push to a remote and you are always out of sync, it's going to slow you down.
    Checking out different commits for different parts of the tree. This one is a bit more rare, it's less common that you'd want this.
    Finally, setting ACLs to deny read access to parts of the tree.
    For all of these cases, there are some ways you can work around the problem. It's not like you're completely dead in the water with Git, it's not like these things are completely impossible to do in Git. It's just that Git isn't good at everything. It's just that Git is exceptionally good for most people who write code.
headcanon 7 years ago

One thing about GitLab tooling is that they have features that apply only on a per-repo basis, for example GitLab CI.
Suppose for example we have 2 distinct projects - a backend and frontend, which each have their own testing and deployment strategy. GitLab CI only allows one CI pipeline config per-repository. While we could take care of that with scripting, that can easily get out of hand as we increas the number of distinct "projects", if we wanted to maintain a monorepo. So the tooling encourages us to have separate repos.
However if we do that, since we don't have that convenient single commit hash that a monorepo gives us, then we don't have a good way to ensure that the deployments between projects are synced up, and rollbacks are far more complicated.
Its a contrived example (for instance we could switch to a different CI system and mitigate this issue), but it seems to me that whatever an organization chooses, mono- or poly-repo, they have to build complicated custom configurations and tooling to get over whatever tradeoffs their decision has. And as the number of logical projects (repos, submodules, etc.) and the commit rate increases, then the tooling has to increase in complexity to handle issues of scale.
So I guess the open question is, is there a way we can somehow have both without spending a bunch of engineering cycles writing custom configs and tools?
- boleary-gl 7 years ago
  
  GitLab Product Manager for CI here
  Thank you for this feedback - it’s something we’re thinking about a lot too. We’ve made some improvements for monorepos (`changes:` keyword) and for micro-services/multi-repo (`trigger:` and `dependency:` keywords) but we’re not satisfied!
  We have two open Epics - one for making CI lovable for monorepos (https://gitlab.com/groups/gitlab-org/-/epics/812) and one for making CI lovable for microservices (https://gitlab.com/groups/gitlab-org/-/epics/813). Would love community feedback on the direction those will take us and how we can up level lovability even more.
  - headcanon 7 years ago
    
    Thanks for listening! We're all big fans of GitLab overall. Good to know you guys are working on filling the feature gap there.
- falsedan 7 years ago
  
  > single commit hash
  sounds like you have a deploy & release issue, not a developing or publishing one. Octopus Deploy was the first system I saw that make a distinction between them, and it eliminated a swathe of issues by simply saying "a release is a set of versioned packages"
  wow octopus deploy got expensive
tyrust 7 years ago

>what's the point of these articles?
Content marketing.
- SilasX 7 years ago
  
  Sorry for unrelated comment, but I remembered your post about Lyft here and whether Vanguard VTSAX holds it [1]. They updated the holdings on 3/31 and it now shows a $3.4 million holding of ~44k shares of Lyft Class A.
  [1] https://news.ycombinator.com/item?id=19640055

Settings

We need a new generation of source control

Keyboard Shortcuts