Introducing Git protocol version 2

opensource.googleblog.com

547 points by robmaceachern 8 years ago · 166 comments

Reader

The current (and pretty much only, ever, despite Linus having been the creator) maintainer of git is a google employee [1], in case anyone else was wondering.

[1] https://en.m.wikipedia.org/wiki/Junio_Hamano

chrisseldo 8 years ago

>"Linus Torvalds said in 2012 that one of his own biggest successes was recognizing how good a developer Hamano was on Git, and trusting him to maintain it."
- kumarharsh 8 years ago
  
  I came across this email from Linus announcing the handover: https://lwn.net/Articles/145123/
  It's interesting how the first ever git project itself was looking for new maintainer almost as soon as it was created.
- vga805 8 years ago
  
  wow! what an accolade
  - Dragonai 8 years ago
    
    Seconded. I can only imagine how good Junio must be to have earned that kind of praise and trust.
AdmiralAsshat 8 years ago

Thanks, that really helps.
As an open-source advocate, my first thought was, "Why the hell is Google releasing a version of a protocol that Linus Torvalds wrote?"
Without that context, it would be like Google throwing up an announcement, "Introducing Google's Linux Kernel 5.0!"
- skywhopper 8 years ago
  
  Yeah, that was my reaction, and it made me sad that Google has so eroded my trust over the decades that I was turned off at seeing an announcement implying they are deeply involved in core open source tools. I mean, who else but companies swimming in cash can truly deeply support this stuff, and for the most part, the people working on these tools really do care about the open source community. But Google's reputation is so tarnished that my gut reaction is at odds with my rational one, and that's a sad thing to realize.
  - vatueil 8 years ago
    
    I wasn't aware that the maintainer of Git works at Google, so I was a bit surprised by the announcement too. But it wasn't because of any drama like Google eroding my trust or whatnot, just that my information was incomplete so my gut reaction was irrational.
- fiatjaf 8 years ago
  
  That may seen odd, but it could happen in a open-source world: multiple parties releasing different versions of the same piece of software and calling it the same.
  - loeg 8 years ago
    
    One fun example of this is the so-called RPM "5" fork[0], which is basically dead and almost entirely unused[1].
    The result is the main RPM everyone uses will probably stay at version 4.x forever.
    [0]: http://rpm5.org/
    [1]: https://en.wikipedia.org/wiki/Rpm_(software)#RPM_v5
    
    nkkollaw 8 years ago
    
    They could skip a version like PHP did. Among other reasons, since books and articles about PHP 6 had already been written long before PHP 5+1 came out, they went with 7.
    
    loeg 8 years ago
    
    Sure, they could. I don't think they've felt the need to. RPM tends to change slowly and conservatively.
  - simplicio 8 years ago
    
    Both Git and Linux are trademarked, presumably to prevent such hijinx
    
    cesarb 8 years ago
    
    Linux is trademarked because of some "hijinx"...
    "Initially, nobody registered it, but on August 15, 1994, [...] filed for the trademark Linux, and then demanded royalties from Linux distributors. In 1996, Torvalds and some affected organizations sued him to have the trademark assigned to Torvalds, and, in 1997, the case was settled." https://en.wikipedia.org/wiki/Linux#Copyright,_trademark_and...
  - AdmiralAsshat 8 years ago
    
    Isn't it customary to at least rename the fork?
    
    abjorn 8 years ago
    
    Customary, but unless the name is trademarked, not required.
    
    fanf2 8 years ago
    
    Some licences require a change of name for substantial modifications, e.g. the Artistic Licence and Apache Licence v1. But those kinds of clauses are pretty rare nowadays.
- Ceezy 8 years ago
  
  Same here. I was like, "If there are about to drop 3 versions at the same time like angular, I need to you use SVN ASAP".
Retroity 8 years ago

Non-Mobile link for those on desktop: https://en.wikipedia.org/wiki/Junio_Hamano
- Cynddl 8 years ago
  
  The mobile version of Wikipedia works perfectly fine in a browser. I personally prefer it for readability.
  - tinus_hn 8 years ago
    
    If you link the normal desktop version it the reader will automatically be redirected to the place they prefer.
  - esaym 8 years ago
    
    I don't...
biohax2015 8 years ago

Fun fact: if you google “git blame” it returns his wikipedia entry.
ojosilva 8 years ago

Alright, I was wondering why this was published on the Google Opensource website. I had no idea. Yet, the Git project itself has not been published under their umbrella.
https://opensource.google.com/projects/list/featured
- willnorris 8 years ago
  
  We currently only list project that are or were primarily developed by Google. We decided to include projects that started at Google and were since donated to foundations, such as Kubernetes.
  But we aren't yet including projects where we are just heavy contributors, but they're not "Google projects". That includes Linux, git, LLVM, and a host of others. We do want to recognize them in our project directory, but want to make sure that they are distinguished from Google projects so that we're not implying something that is accurate.
- avar 8 years ago
  
  One of the places it's hosted at is https://kernel.googlesource.com/pub/scm/git/git
  See the list of URLs at https://public-inbox.org/git/xmqqindt6g1r.fsf@gitster.mtv.co...

simias 8 years ago

Let that be a reminder to all the coders out there: if you ever design a protocol or file format to communicate between machines always remember to add a version field or some other way to allow for updates and revisions later without breaking everything. Having a way to specify extensions in a backward-compatible way is nice too.

cesarb 8 years ago

> if you ever design a protocol or file format to communicate between machines always remember to add a version field or some other way to allow for updates and revisions later without breaking everything
Also, somehow make sure no servers, clients, or third-party middleboxes break when the version field is incremented. The TLS protocol designers had to give up on the version field; it's now going to forever be stuck at "TLS 1.2", since too much would break otherwise.
- icebraining 8 years ago
  
  It's an universal truth: if you want to keep something from jamming up, you need to exercise it. It's true for the human body, for machine parts and for protocol features.
  - stingraycharles 8 years ago
    
    HTTP was at 1.1 for a very long time but it appears the upgrade to version 2 is going fine. What is the difference here? The protocol version exchange mechanism?
    
    wmf 8 years ago
    
    The HTTP 1.1 to 2 upgrade is only going fine due to massive work (mostly by Google) over a period of years. HTTP/2 was also able to benefit from a lot of pain that SPDY and WebSockets went through earlier. Protocol ossification is still a hard problem.
    
    SahAssar 8 years ago
    
    IIRC WebSockets has basically nothing to do with the way http2 is handled, and websockets are still going over a simple HTTP1.1 Connect/Upgrade. What is the connection between websockets and http2?
    
    wmf 8 years ago
    
    Early version of WebSockets exposed bugs in HTTP proxies, some of which were security problems: http://www.adambarth.com/papers/2011/huang-chen-barth-rescor... To fix these kinds of problems, the final version of WebSockets does not use a straightforward upgrade but instead has a kludgey handshake and content masking: https://en.wikipedia.org/wiki/WebSocket#Protocol_handshake https://trac.ietf.org/trac/hybi/wiki/FAQ
    HTTP/2 doesn't have the same problems because it requires TLS+ALPN, but IIRC that "clean" solution was only arrived at after years of discussion and experimentation.
    
    hannob 8 years ago
    
    This is one of the major reasons HTTP 2 is only supported via TLS and only via a complex upgrade protocol.
    It's not that you can just do "GET / HTTP/2.0" or something like that.
    The TLS part is interesting, as wrapping a protocol into an encrypted channel solves a lot of these issues (but it can break again if you have stupid man in the middle boxes). It just doesn't solve the issue for TLS itself.
    
    cesarb 8 years ago
    
    The main difference is that a "side channel" of the TLS connection (the NPN or ALPN extensions) is used to negotiate HTTP/2. The upgrade to version 2 without the TLS wrapper failed; so many servers and/or middleboxes had issues with it, that all browser makers decided "HTTP/2 is going to be TLS only" (the current "encrypt all the things" push played a small part, but the main reason was the compatibility problems).
    
    geofft 8 years ago
    
    Clients can fall back, and falling back to HTTP/1.1 isn't a security problem the way falling back to, say, TLSv1.1 is.
    
    sp332 8 years ago
    
    Because it's up to the client to request HTTP 2 if they support it. https://http2.github.io/http2-spec/#discover-http
    
    stingraycharles 8 years ago
    
    But why is this not supported by TLS? Is it set up in such a way that it could never be amended to have a fallback?
    
    EvilTerran 8 years ago
    
    If the newest version of a secure communication protocol includes some way to negotiate down to an older version, that opens the door to downgrade attacks - you risk ending up with a protocol that, in practice, has all the vulnerabilities of both versions.
    
    tialaramex 8 years ago
    
    You can work around this by having downgrade protection, and TLS 1.3 has this out of the box, it was also added belatedly to TLS 1.2 (but obviously the problem there is, you can still downgrade whenever either client or server knows TLS 1.2 but doesn't have protection yet)
    In TLS 1.3 the downgrade protection works like this:
    If I'm a TLS 1.3 server, and a connection arrives that says it can only handle TLS 1.2 or lower, I scribble the letters "DOWNGRD" (in ASCII) near the end of a field labelled Random that is normally entirely full of random bytes.
    If I'm a TLS 1.3 client, I try to ask for TLS 1.3 from the server when I connect, if instead I get a TLS 1.2 or earlier reply, I check the Random field, and see if it spells out "DOWNGRD" near the end. If it does, somebody is trying to downgrade my connection, I am being attacked and can't continue.
    This trick works because if bad guys tamper with the Random field then the connection mysteriously fails (client and server are relying on both knowing all these bytes to choose their encryption keys with ephemeral mode) while older clients won't see any meaning in the letters DOWNGRD near the end of these random bytes - so they won't freak out.
    You might worry: What if somebody just randomly picked "DOWNGRD" by accident for a TLS 1.3 connection ? If every single person in the world makes one connection per second, this is likely to happen to one person, somewhere, only once every few years. So we don't worry about this.
    
    sp332 8 years ago
    
    Oh that's a good question in context of middleboxes. I don't know of any that force HTTP/1.1, but they might actually!
  - arvinds 8 years ago
    
    Amen!
- nneonneo 8 years ago
  
  Per the TLS 1.3 RFC:
  > In previous versions of TLS, this field was used for version negotiation and represented the highest version number supported by the client. Experience has shown that many servers do not properly implement version negotiation, leading to "version intolerance" in which the server rejects an otherwise acceptable ClientHello with a version number higher than it supports. In TLS 1.3, the client indicates its version preferences in the "supported_versions" extension (Section 4.2.1) and the legacy_version field MUST be set to 0x0303, which is the version number for TLS 1.2. (See Appendix D for details about backward compatibility.)
  It's really too bad that the version field can't be used as a version field anymore, but thankfully the "extension" format is pretty flexible in that regard.
  - pilif 8 years ago
    
    >but thankfully the "extension" format is pretty flexible in that regard
    Just like the version field.
    I'm sure middlebox software is being updated as we speak to terminate connections with unknown versions in the „supported_versions“ extension.
    
    vitus 8 years ago
    
    If the extension field is anything like IP or TCP options, some middleboxes will also tamper the hell out of that field and strip unknown extensions, or just break connections.
    Often-referenced paper in that field: http://conferences.sigcomm.org/imc/2011/docs/p181.pdf
- apple4ever 8 years ago
  
  In my opinion, they should break things that misuse the version field. Then maybe the makers will learn to develop properly.
  - tialaramex 8 years ago
    
    It's almost invariably end users who suffer, not the "makers". And because of a human cognitive bias it doesn't matter that the middleboxes are wrong, if you get a new Chrome and it doesn't work you blame Chrome, you don't blame the middlebox that had been getting this wrong for five years.
    Almost a year's work on TLS 1.3 was spent on working around problems with middleboxes. Because without that it would be impossible to deploy in practice. TLS 1.2 took years to deploy because so many middleboxes were incompatible and we had to wait for them to rust out.
- CGamesPlay 8 years ago
  
  What would break? Are you saying a TLS 1.3 client would not be able to connect to a TLS 1.2 server because the version request would cause the server to reject the client?
  - cesarb 8 years ago
    
    Yes. Or worse: a completely unrelated box in the middle of the path could drop all the TLS 1.3 packets, so instead of a clean rejection, the connection gets stuck.
  - gsnedders 8 years ago
    
    Yes.
Too 8 years ago
I'm mostly surprised they solved a critical server bug on the client side and by introducing even more hacks into the protocol. I mean, who in their right mind would run a public git server with this super easy to exploit DOS bug:
```
    Unfortunately due to a bug introduced in 2006 we aren't
    able to place any extra arguments (separated by NULs) other
    than the host because otherwise the parsing of those
     arguments would enter an infinite loop. 
```
I'm not sure if entering an infinite loop means what i think it does in this context but that's almost CVE worthy and they should release a fix and mark that version as obsolete as ever and never have to make their clients cater to it any more.
- bronson 8 years ago
  
  It's been fixed for almost a decade. You're asking for a retroactive CVE?
  You can read about their fix by clicking the next link in the article.
bluejekyll 8 years ago

DNS has no version field. I'm torn as to the choice here. On the one hand, DNS is backwards compatible with everything.
EDNS is the only way to extend the protocol now, which is basically just adding additional Records to the Message that are designated as Extended DNS records, and treated specially.
- jburgess777 8 years ago
  
  The IETF is working on a document which describes many reasons why DNS may stop working. EDNS related issue are in section 3.2:
  https://tools.ietf.org/html/draft-ietf-dnsop-no-response-iss...
  - spc476 8 years ago
    
    My own code to decode DNS packets [1] fell afoul of section 3.1.3 of the draft document. I fixed the issue, but the reason I originally rejected DNS packets with unknown flags was on the assumption of potential garbage being used as a possible exploit.
    [1] https://github.com/spc476/SPCDNS
  - bluejekyll 8 years ago
    
    This is a great resource. Thank you for sharing.
    I don’t read that as DNS stoping to work, but more reasons why DNS is flaky in different scenarios.
    Some of the issues there are things related to mitigation’s against reflection attacks etc. I haven’t read the entire doc, but does it go into concerns around DDOS and other such things, and how DNS servers to mitigate those attacks?
    Edit: right in the intro. So a server needs to “understand” when it is under “attack” and only then put in mitigations against the attack. In the worst case, the server doesn’t do this, fixes the issues in this RFC to always respond and then amplify the attack.
- oripring 8 years ago
  
  The message header hasn't been fully exhausted yet. Beyond the spare bit[1] in the header there is unassigned OPCODE values which can be used to bend the format in new ways[2].
  1] It was briefly used experimentally if I recall
  2] https://tools.ietf.org/html/draft-ietf-dnsop-session-signal

avar 8 years ago

The specification of the v2 protocol is here: https://github.com/git/git/blob/master/Documentation/technic...

One of the more exciting things is that it can now be extended to arbitrary new over-the-wire commands. So e.g. "git grep" could be made to execute over the network if that's more efficient in some cases.

This will also allow for making things that now use side-transports part of the protocol itself if it made sense. E.g. the custom commands LFS and git-annex implement, and even more advanced things like shipping things like the new commit graph already generated from the server to the client.

xorcist 8 years ago

If you are to link to a git repo, don't link to some unoffical mirror. That would just confuse search engines.
The specificiation of the v2 protocol is here: https://git.kernel.org/pub/scm/git/git.git/tree/Documentatio...
(There are a couple of repos listed as official mirrors, such as the googlesource.com one, but the one you linked to isn't one of them.)
- avar 8 years ago
  
  The repository I linked to is official. See https://public-inbox.org/git/xmqqindt6g1r.fsf@gitster.mtv.co...
  What list are you referring to? If it doesn't list the one on GitHub it needs to be fixed.
  - xorcist 8 years ago
    
    I had a release announcement open for other reasons and noted the different URLs, but I was wrong and the link is absolutely fine. Thanks. If I could edit the post I would, so I'll just let this sit here in case anyone is confused (as I was).

Someone1234 8 years ago

Too bad they didn't make Git LFS part of Version 2[0]. Most vendors[2] support LFS already but because it isn't required, some still lack it and its support cannot be assumed.

[0] https://git-lfs.github.com/

[1] https://github.com/git-lfs/git-lfs/wiki/Implementations

icebraining 8 years ago

I say that's LFS' fault. Why do you even need a custom server? It should just be able to use any ol' file server or S3-API compatible service, and do everything on the client side.
I find git-annex a much better solution, it's a shame everyone went with LFS.
- rspeer 8 years ago
  
  My experience with git-annex is that it seems heavily designed for individuals and not for projects. The places it looks for files are often just a computer you were once developing on, and it sometimes expects you to go find that computer. It never forgets about any crazy place your files have been.
  It was very hard to use in asymmetric cases where different people have different credentials, such as where one person has access to a computer and others don't, or where a couple of core developers have authenticated R/W access to a file server or an S3 bucket and everyone else just has HTTP.
  - icebraining 8 years ago
    
    Git-annex doesn't look for files in random places. It uses the regular git remotes, plus something it calls "special remotes" which are basically accounts on file servers/S3/etc that you can manually add.
    If Github et all thought this was confusing, they could have made a "beginner's mode" that auto-selected the storage server based on the git server, like LFS does. Which would still have been better, since it wouldn't have required a custom server API.
    It was very hard to use in asymmetric cases where different people have different credentials
    Right, but LFS can't be used in asymmetric cases at all - it assumes anyone with access to the git repository has access to the LFS storage area.
    
    rspeer 8 years ago
    
    > Right, but LFS can't be used in asymmetric cases at all - it assumes anyone with access to the git repository has access to the LFS storage area.
    Wait, really? I thought that Git LFS let people with push access push files to the LFS area, which can then be read by anyone. That's asymmetric in the way everyone expects from GitHub. But I didn't use Git LFS because it's too expensive.
    Yes, I probably encountered extra weirdness from git-annex, from the fact that the codebase was on GitHub, which doesn't support git-annex, so _everything_ in git-annex had to be on a different remote.
    If it was meant to be used with the upstream as the only remote, that makes things make a lot of sense, and explains why my attempt to use it felt a lot like early Git, where there was no good upstream service like GitHub.
xorcist 8 years ago

What kind of changes to the wire protocol would help git-lfs? It seems to have no specific dependencies on protocol features.
If standard git ever implements shallow blob fetching, it would preferrably make git-lfs obsolete rather than help it.
eridius 8 years ago

Requiring Git-LFS support would be rather problematic for anyone who self-hosts git repos over SSH.
- Someone1234 8 years ago
  
  They'd just stick to Protocol 1
  - eridius 8 years ago
    
    Why should people who self-host repos not be able to benefit from the improvements in protocol 2? Especially if other future extensions to protocol 2 prove useful for self-hosters?
    Git is a decentralized version control system. Its core networking protocol must remain useful for people who self-host.
_ikke_ 8 years ago

Git LFS is not part of core-git, but an extension built and maintained by github, and the code lives outside of the git tree, so it cannot be a required part of the protocol.
- Someone1234 8 years ago
  
  > Git LFS is not part of core-git
  I know, that's what I am suggesting should change in version 2.0. It is a widely supported popular extension that solves a major pain point for Git, most vendors have adopted it.
  New things can absolutely be required as part of a new protocol version, in fact this blog post lists several new things that will be new in 2.0 and beyond.
  The analogy I'd use is HTTP/2 and SPDY. SPDY started out as a Google produced extension to HTTP, gained popularity, and was then standardized/merged into the HTTP/2 standard. All I am suggesting is Git LFS receive the same treatment.
  - slrz 8 years ago
    
    The way to make that happen is for some interested party to take the LFS code and submit it for merging into git proper. If there were prior attempts, study them carefully and learn from them. It probably won't be accepted the first time, so you need to be persistent, addressing reviewer's comments along the way.
  - falsedan 8 years ago
    
    git v2.0 came out 4 years ago; these release notes are regarding a new version of the wire protocol used to communicate with remote repos.
49bc 8 years ago

I wouldn't be so sure. They said one of the motivations was to "unblocking the path to more wire protocol improvements in the future".
xenomachina 8 years ago

> but because it is required
Just to confirm, but you meant "because it is not required", right?
- Someone1234 8 years ago
  
  Right. Edited.

lsiebert 8 years ago

I'm a very (very) minor contributor to git.

If you are at all interested in hacking on Git, it's not that difficult. Knowing C and portable shell scripting for writing tests are the big things.

One sticking point, you need to submit patches to the mailing list, you can't just do a github pull request.

See https://github.com/git/git/blob/master/Documentation/Submitt...

I still see github pull requests rather frequently, even though they have never been allowed. All discussion AND patches go through the mailing list, much like the linux kernel.

pm215 8 years ago

It's unfortunate that github doesn't let a project disable the on-website UI for pull request submission; as it is it's easy for somebody to end up wasting their time trying to submit a change that way. (QEMU has that issue too.)
- ImJasonH 8 years ago
  
  Totally agree! I made nopullrequests.com to help solve this.
  - lsiebert 8 years ago
    
    And that's nice, but I'd also love to see a bot that formats the pull request into a patch email for you.

newscracker 8 years ago

Sometime in the far future, someone will write an interesting story about how a double null byte came into existence in the git request protocol, and it will be amusing and interesting to look back. As the saying goes, hindsight is always 20/20. I'm glad that they found ways to maintain backward compatibility, at only a minor cost to understanding things.

Confiks 8 years ago

It's quite a comedy that this feature has not been implemented for at least 6 years, solely because the raw git:// protocol's parameter handling was severely broken, and feature detection by disconnecting and retrying [1] was ultimately deemed far too dirty.

[1] https://public-inbox.org/git/CAJo=hJtZ_8H6+kXPpZcRCbJi3LPuuF...

cpburns2009 8 years ago

Wait, why was this posted by Google? I thought Git was made by Linus Torvalds.

hk__2 8 years ago

Git was created by Linus Torvalds, but out of the 50k+ commits on the repo, only 250 or so are from him, with only 14 in the past 6 years. [1]
[1] https://github.com/git/git/graphs/contributors?from=2012-03-...
- sdesol 8 years ago
  
  Here is a more detailed analysis, which shows all contributors:
  https://public.gitsense.com/insight/github?r=git/git#b%3Dgit...
  These are contributions by Linus:
  https://public.gitsense.com/insight/github?r=git/git#b%3Dgit...
  and as you can see, his contributions, really tapered off after 2010, while contributions from Hamano remained steady from 2008 to present date, as shown below:
  https://public.gitsense.com/insight/github?r=git/git#b%3Dgit....
Semaphor 8 years ago

> Linus Torvalds said in 2012 that one of his own biggest successes was recognizing how good a developer Hamano was on Git, and trusting him to maintain it.
tgummerer 8 years ago

It's because a Google employee implemented protocol v2, and wrote a post about it.
euyyn 8 years ago

Opensource though.

rwmj 8 years ago

Is there a git protocol variant that allows the client to avoid downloading objects that it already has stored locally in another repository or cache?

For example: I have the Linux kernel already cloned in some directory. I clone a second repo which has the Linux kernel as a submodule. Can I clone the second repo straightforwardly without having to download Linux a second time? (Well yes, but only by manual intervention before doing the git submodule update - it'd be nice if objects could be shared in a cache across also repos somehow).

Boulth 8 years ago

Have you seen alternates? https://stackoverflow.com/questions/36123655/what-is-the-git...
nothrabannosir 8 years ago
You could literally link the two object directories?
I just tried this and it seems to work:
```
  git clone git://github.com/git/git
  mkdir git2
  cd git2
  git init
  cd .git/
  rm -rf objects
  ln -s ../../git/.git
  cd ../
  git remote add origin git://github.com/git/git
  git fetch # returned without downloading anything
  git checkout master
  ls # etc.
```
If you seriously want to use this, you'll probably want to hard link the contents, instead. But iirc git clone from local disk already does that, for you?
In short: clone your local copy and taking it from there?
- falsedan 8 years ago
  You can also use alternates:
  echo ../../../git/.git/objects >> git2/.git/objects/info/alternates
  or use the original as a reference:
  git clone --reference git git://github.com/git/git git2
  This sets up the alternates for you.
wereHamster 8 years ago

There's git command for that: https://git-scm.com/docs/git-worktree
Bjartr 8 years ago

Maybe this project could work for you?
https://github.com/jonasmalacofilho/git-cache-http-server
andrewaylett 8 years ago

I'm assuming from your comment that you're already aware of --reference but it doesn't completely meet your needs? The only other thing I can think of would be to use the 'insteadOf' configuration to tell Git to use your local clone instead of the remote one. Search 'git help config' for 'url.<base>.insteadOf'.

buckminster 8 years ago

AIUI, the git ssh protocol is just the git protocol tunnelled through ssh. So why do they need different mechanisms for signalling V2?

_wmd 8 years ago

Deploying Git over SSH entails locking the precise command line executable by the public key you use to authenticate. Locking SSH SendEnv down is mandatory too, otherwise thousands of people would have shell access to GitHub.com!
This isn't even theoretical, there was an environment-related bug not 5 years ago involving Git. At least BitBucket was impacted, I think GitHub were patched before it was announced
- simias 8 years ago
  
  I don't think that answers the parent's question, if the update was in the git protocol itself (encapsulated in the SSH session) then you wouldn't have to change anything at the SSH level.
  As you point out selectively allowing a new environment variable could open a can of worms for shared hosts like github if they mess up their implementation.
xyzzyz 8 years ago

Because if you tunnel through ssh, you can signal v2 using ssh mechanism of setting environment variables. If you don't tunnel, you don't have this option. This is clearly described in the article.
- deathanatos 8 years ago
  
  I think what the person you're replying to is asking is why not, in the case of ssh, use the signaling in the git protocol, since it will be there anyways. That is, if you don't tunnel, you must signal w/ the git protocol. If you do tunnel, why use a different mechanism, since the signal in the git protocol must be there?
  I think that this is because the SSH protocol isn't just encapsulating the Git protocol directly (the initial assumption of ssh "just" encapsulating the git protocol is not fully correct), and one of the parts that differs is this particular part. (Since on the git protocol side, we need to select a "service":
  > a single packet-line which includes the requested service (git-upload-pack for fetches and git-receive-pack for pushes)
  which in SSH would be done not by transmitting that packet-line but by instructing SSH to run that particular executable.
  > This is clearly described in the article.
  It really isn't, IMO; if you don't have precise knowledge of the protocols involved, I don't think anything in the article particularly spells this out.
- buckminster 8 years ago
  
  Yes, but once you've updated the git protocol, ssh support comes for free. Having one mechanism is simpler than having two. And as your sibling notes, setting env vars from ssh has disadvantages. So why bother?

Boulth 8 years ago

> Server-side filtering of references

I wonder if this will be somehow exposed by git daemon. It could be used for easy per ref access controls.

For example Git Switch [0] that uses Macaroons had to clone the repository to implement per ref ACL.

[0]: https://github.com/rescrv/gitswitch

ksec 8 years ago

I thought google uses hg, have they switched over to git as well?

seabrookmx 8 years ago

For all the "big" Google projects they use a proprietary system called piper.
I think all their open-source stuff (Angular, GoLang, Android) uses git (and sometimes Gerrit).
Although given Google's scale, I'm sure there's some teams/projects that use Mercurial.
- ngoldbaum 8 years ago
  
  In fact, developers are allowed to use whichever VCS tool they want on their local machine (or on the online coding in the cloud CitC environment). Some opt to use hg. The canonical repo is in piper though, so the hg commits or git commits get converted before they land.
- kardianos 8 years ago
  
  Gerrit is a review server that uses git. In fact, Gerrit now stores the majority of information in git itself for all the information it uses.
  So for Google external projects, they use git.
  > Although given Google's scale, I'm sure there's some teams/projects that use Mercurial.
  I doubt it. Their tooling is probably pretty specific, and now that code.google.com has shut down, they probably don't have any review servers that support it.
  - harveynick 8 years ago
    
    Yes and no. The answer is actually quite complicated... and I have no idea if I'm allowed to talk about it publically or not.
    
    harveynick 8 years ago
    
    The most recent reason public reference I can find to this is from 2016: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...
    Here's the money quote:
    "The team is also pursuing an experimental effort with Mercurial an open source DVCS similar to Git. The goal is to add scalability features to the Mercurial client so it can efficiently support a codebase the size of Google's. This would provide Google's developers with an alternative of using popular DVCS-style workflows in conjunction with the central repository. This effort is in collaboration with the open source Mercurial community, including contributors from other companies that value the monolithic source model."
    Project that forward logically by two years.
    
    ksec 8 years ago
    
    Not sure why I got heavily downvotes. This above was the pieces of information that got me to think they were all on hg. So judging from the comment I stand corrected.
    
    harveynick 8 years ago
    
    Your assumption was pretty reasonable based on the public information. Honestly I’d love to talk about how Google does source control/ code review etc. because it’s actually pretty interesting at this point. You know... for some values of interesting.
    
    falsedan 8 years ago
    
    Then don't waste peoples time with vacuous comments.
    
    harveynick 8 years ago
    
    Ironic reply.
    Sincere apologies if you can't derive any information from my comment, but that doesn't mean there isn't any there.
    
    falsedan 8 years ago
    
    The only information is “google employs me”.
    
    harveynick 8 years ago
    
    Does Google use hg? “Yes and no. It’s complicated.” You can’t read anything into that?
  - joatmon-snoo 8 years ago
    
    Everything speaks Piper.
    Devs can use the mercurial/git clients mentioned in the paper linked by harveynick.
  - seabrookmx 8 years ago
    
    > Gerrit is a review server that uses git
    Yup! I use Gerrit at my company and share Administration duties with our Devops team.
    I know Android uses Gerrit I just wasn't sure if Angular and co. did which is why I worded it a bit more vaguely.
- pjmlp 8 years ago
  
  Go started on Mercurial and then eventually moved into Git.
  - jordigh 8 years ago
    
    And they like neither. They really want a versioned filesystem.
    
    dilap 8 years ago
    
    That's interesting. Do you have more details / refs?
    
    jordigh 8 years ago
    
    I understand this was private communication from Rob and Russ.
    
    dilap 8 years ago
    
    well dang! :-)

hartator 8 years ago

Is Git a Google project now?

jkaplowitz 8 years ago

No, but many of the core contributors are employed by Google and spend time on it as part of their day job (with Google's knowledge and permission). This post straddles both the open source part of their jobs and the "Git deployment at Google" part.
s2g 8 years ago

BRB switching to Mercurial

Ericson2314 8 years ago

This is disgusting. So little forsight in the past... At least the outcome of quite useful.

jonknee 8 years ago

Ah yes, how disgusting that the developers of this free software that I've done nothing for except use for years made an unfortunate decision a decade ago.
needz 8 years ago

IIRC, they were in a serious time crunch when they drafted/made git. I can't remember the whole story...
- tytso 8 years ago
  
  At the time Linus was the sole author/contributor of git, and he needed a replacement for BitKeeper in a hurry. BitKeeper had been made unavailable for Linux kernel development because the proprietor of BitKeeper was really unhappy that Tridge had reverse engineered the protocol and created an open-source client[1] which could talk to the Bitkeeper server. Linus created the first version of git sufficient to do a kernel commit in ten days[2].
  [1] https://sourceforge.net/projects/sourcepuller/
  [2] https://www.linuxfoundation.org/blog/10-years-of-git-an-inte...
  (As Tridge tells the story[3], he telnet'ed to the bk port and typed "help" so it wasn't that much of a reverse engineering effort. :-)
  [3] https://lwn.net/Articles/132938/
- jimmy1 8 years ago
  
  And really, like other famous software that people love to heap shade w.r.t how awful it is focusing only on it's warts instead of the immense productivity realized as a result, git really does have some nice parts and it was the best option for a while. The fundamental concepts of git is really not that hard to understand -- it's fundamental architectural model is event sourcing, and it's fundamental data structure is a DAG. Those are pretty good choices.
  I personally have stuck to kind of basic git usages (call it "Git: The Good Parts" if you will), and have never had the problems people claim to have with git. It just has always worked, and it has always been there for me.
- mc42 8 years ago
  
  I thought Linus Torvalds was almost wholly responsible for the initial development of git? Even so, everything, especially software, is easier in hindsight....
  - needz 8 years ago
    
    Sorry, I was using the 'singular' they.
- ktsmith 8 years ago
  
  Using BitKeeper as the SCM for the Linux kernel always seemed like a bad idea and when issues between the company and the community peaked git was created.
  https://git-scm.com/book/en/v2/Getting-Started-A-Short-Histo...
  - stevekemp 8 years ago
    
    Agreed.
    Bitkeeper itself is open-source these days available via the Apache 2.0 License, but it is too little, too late:
    http://www.bitkeeper.org/
- wereHamster 8 years ago
  
  s/they/he/. It was Linus himself who alone created git, within a few weeks (two or three). At first it was just a handful of shell scripts, but it was self-hosting pretty early on.
  - jordigh 8 years ago
    
    Linus wasn't the only one with git; all Linux devs were in a hurry. This is also why Mercurial happened.

s2g 8 years ago

oh neat, and it's on a google blog.

That's great. Another subtle reminder that this ad company has way too much control.

GauntletWizard 8 years ago

Interesting that they took to the Google blog to announce this; is there a corresponding LKML post?

jkaplowitz 8 years ago

Why LKML? Despite Git's origins from and use by the Linux project, it isn't especially tied to it now.
LKML would presumably be the place for Linux to announce when they adopt this.
The Google open source blog is among the several credible options for this post, since Google employs much of the core Git team, and this post discusses their experience deploying Git protocol v2 at Google.
As noted in the blog text, it's not in a released version of Git yet, just Git master branch. So maybe it'll appear on a dedicated Git announcement list, if any, once that happens.
- joatmon-snoo 8 years ago
  
  Junio posts on the list when there's a new release, e.g. https://public-inbox.org/git/xmqqwoxw6kkk.fsf@gitster-ct.c.g...
  It seems that https://groups.google.com/forum/#!forum/git-packagers is the closest thing to a formal announcement list that there is.
  - jkaplowitz 8 years ago
    
    Okay, I guess that tie continues for historical reasons. At least people who don't otherwise subscribe to LKML can still receive Git release announcements via the second link.
    I presume Git 2.18 (the first release supporting protocol v2) will be announced via both channels once it's out.
Analemma_ 8 years ago

> support for v2 was recently merged to Git's master branch and is expected to be part of Git 2.18
Not yet, but presumably there will be a post like this: https://lkml.org/lkml/2018/4/2/425 when it is released. It is strange that the Google Blog is the first place to announce it through.
tgummerer 8 years ago

As mentioned in another comment, protocol v2 was implemented by a Google employee, and they decided to write a blog post about it. This is not an official git announcement.
u801e 8 years ago

I found a mention of it in a "what's cooking" post on the git mailing list (Message-ID <xmqqvabm6csb.fsf@gitster-ct.c.googlers.com>). But I can't find a direct link on gmane.com right now.

gpvos 8 years ago

Git didn't have a proper version number or extensibility field in its protocol? That's quite a bit of hubris.

zeroxfe 8 years ago

Or, more likely, an oversight.
- gpvos 8 years ago
  
  Hmm, I haven't designed very many data formats or wire protocols, and I won't claim I got it right any of those times, but I included some kind of extension possibility every time.
  - gchpaco 8 years ago
    
    I damn near released a (private) message protocol without a version field a couple months ago, and I know better. Fortunately stopped myself and added it before any actual data got released.
  - shakna 8 years ago
    
    Git was a 10-day urgent project. Given the timeframe, it's done remarkably well.

Settings

Introducing Git protocol version 2

Keyboard Shortcuts