Settings

Theme

S3 as a Git remote and LFS server

github.com

197 points by kbumsik a year ago · 53 comments

Reader

mdaniel a year ago

All this mocking when moto exists is just :-( https://github.com/awslabs/git-remote-s3/blob/v0.1.19/test/r...

Actually, moto is just one bandaid for that problem - there are SO MANY s3 storage implementations, including the pre-license-switch Apache 2 version of minio (one need not use a bleeding edge for something as relatively stable as the S3 Api)

  • notpushkin a year ago

    > there are SO MANY s3 storage implementations

    I suppose given this is under the AWS Labs org, they don’t really care about non-AWS S3 implementations.

    • mdaniel a year ago

      Well, I look forward to their `docker run awslabs/the-real-s3:latest` implementation then. Until such time, monkeypatching api calls to always give the exact answer the consumer is looking for is damn cheating

      • notpushkin a year ago

        Agreed, haha. Well, I think it should work with Minio & co. just as well, but be prepared to have your issues closed as unsupported. (Pesonally, I might give it a go with Backblaze B2 just to play around, yeah)

      • chrsig a year ago

        it wouldn't be unprecedented. dynamodb-local exists.

  • SahAssar a year ago

    Do you mean boto (the python SDK for AWS)?

    EDIT: They probably do not, I'm guessing they mean https://docs.getmoto.org/en/latest/index.html ?

    • flakes a year ago

      moto server for testing S3 is pretty great. It’s about the same experience as using a minio container to run integration tests against.

      I use this, and testing.postgresql for unit testing my api servers with barely any mocks used at all.

    • mdaniel a year ago

      Happy 10,000th Day to you :-D Yes, moto and its friend localstack are just fantastic for being able to play with AWS without spending money, or to reproduce kabooms that only happen once a month with the real API

      I believe moto has an "embedded" version such that one need not even have in listen on a network port, but I find it much, much less mental gymnastics to just supersede the "endpoint" address in the actual AWS SDKs to point to 127.0.0.1:4566 and off to the races. The AWS SDKs are even so friendly as to not mandate TLS or have allowlists of endpoint addresses, unlike their misguided Azure colleagues

  • remram a year ago

    Unfortunately there's been a few vulnerability since that old Minio release. For something you expose to users, it's a problem.

    • mdaniel a year ago

      I would hope my mentioning moto made it clear my comment was about having an S3 implementation for testing. Presumably one should not expose moto to users, either

Scribbd a year ago

This is something I was trying to implement myself. I am surprised it can be done with just an s3 bucket. I was messing with API Gateways, Lambda functions and DynamoDB tables to support the s3 bucket. It didn't occur to me to implement it client side. I might have stuck a bit too much to the lfs test server implementation. https://github.com/git-lfs/lfs-test-server

CGamesPlay a year ago

If you are interested in using S3 as a git remote but are concerned with privacy, I built a tool a while ago to use S3 as an untrusted git remote using Restic. https://github.com/CGamesPlay/git-remote-restic

zmmmmm a year ago

Just remember, the mininum billing increment for file size is 128KB in real AWS S3. So your Git repo may be a lot more expensive than you would think if you have a giant source tree full of small files.

doctorpangloss a year ago

https://alanedwardes.com/blog/posts/serverless-git-lfs-for-g...

I’ve used this guy’s CloudFormation template since forever for LFS on S3.

GitHub has to lower its egregious LFS pricing.

x3n0ph3n3 a year ago

Wow, AWS really wants to get rid of CodeCommit.

Evidlo a year ago

For the LFS part there is also dvc which works better than git-lfs and natively supports S3.

  • matrss a year ago

    There is also git-annex, which supports S3 as well as a bunch of other storage backends (and it is very easy to implement your own, it just has to loosely resemble a key-value store). Git-annex can use any of its special remotes as git remotes, like what the presented tool does for just S3.

  • kernelsanderz a year ago

    Also worth checking out https://github.com/jasonwhite/rudolfs

    Been using it to store datasets via lfs. Written in rust and has been very reliable.

  • bagavi a year ago

    Dvc is great tool!

    • lenova a year ago

      I haven't heard of dvc, so I had to google it, which took me to: https://dvc.org/

      But I'm still confused as to what is dvc is after a cursory glance at their homepage.

      • chatmasta a year ago

        It was on the front page contemporaneously with this comment that recommended it, so you know it was an unbiased recommendation. :)

milkey_mouse a year ago

You can also do this with Cloudflare Workers for fewer setup steps/moving parts:

https://github.com/milkey-mouse/git-lfs-s3-proxy

philsnow a year ago

I'm surprised they just punt on concurrent updates [0] instead of locking with something like dynamodb, like terraform does.

[0] https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...

kernelsanderz a year ago

I’ve been using https://github.com/jasonwhite/rudolfs - which is written in rust. It’s high performance but doesn’t have all the features (auth) that you might need.

fortran77 a year ago

Amazon has deprecated Amazon Code Commit, so this may be an interesting alternative.

  • adobrawy a year ago

    In what use case it can be interesting alternativd?

    Limited access control (e.g. CI pass required), so not very useful for end users. For machine-to-machine it's an additional layer of abstraction when a regular tarball is fine.

tonymet a year ago

how does it handle incremental changes? If it’s writing your entire repo on a loop, I could see why AWS would promote it.

WhyNotHugo a year ago

git-annex also has native support for s3.

  • matrss a year ago

    I think this is more about storing the entire repository on s3, not just large files as git-lfs and git-annex are usually concerned with. But coincidentally, git-annex somewhat recently got the feature to use any of its special remotes as normal git remotes (https://git-annex.branchable.com/git-remote-annex/), including s3, webdav, anything that rclone supports, and a few more.

xena a year ago

How do you install this? Homebrew broke global pip install. Is there a homebrew package or something?

  • mdaniel a year ago

    FWIW, their helpers make things pretty cheap to create new Formula by yourself

        $ brew create --python --set-license Apache-2 https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz
        Formula name [git-remote-s3]:
        ==> Downloading https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz
        ==> Downloading from https://codeload.github.com/awslabs/git-remote-s3/tar.gz/refs/tags/v0.1.19
        ##O=-#   #
        Warning: Cannot verify integrity of '84b0a9a6936ebc07a39f123a3e85cd23d7458c876ac5f42e9f3ffb027dcb3a0f--git-remote-s3-0.1.19.tar.gz'.
        No checksum was provided.
        For your reference, the checksum is:
          sha256 "3faa1f9534c4ef2ec130fac2df61428d4f0a525efb88ebe074db712b8fd2063b"
        ==> Retrieving PyPI dependencies for "https://github.com/awslabs/git-remote-s3/archive/refs/tags/v0.1.19.tar.gz"...
        ==> Retrieving PyPI dependencies for excluded ""...
        ==> Getting PyPI info for "boto3==1.35.44"
        ==> Getting PyPI info for "botocore==1.35.44"
        ==> Excluding "git-remote-s3==0.1.19"
        ==> Getting PyPI info for "jmespath==1.0.1"
        ==> Getting PyPI info for "python-dateutil==2.9.0.post0"
        ==> Getting PyPI info for "s3transfer==0.10.3"
        ==> Getting PyPI info for "six==1.16.0"
        ==> Getting PyPI info for "urllib3==2.2.3"
        ==> Updating resource blocks
        Please run the following command before submitting:
          HOMEBREW_NO_INSTALL_FROM_API=1 brew audit --new git-remote-s3
        Editing /usr/local/Homebrew/Library/Taps/homebrew/homebrew-core/Formula/g/git-remote-s3.rb
    
    They also support building from git directly, if you want to track non-tagged releases (see the "--head" option to create)
mattxxx a year ago

This seems wrong, since you can't push transactionally + consistently in S3.

They address this directly in their section on concurrent writes: https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...

And in their design: https://github.com/awslabs/git-remote-s3?tab=readme-ov-file#...

But it seems like this is just the wrong tool for the job (hosting git repos).

Havoc a year ago

Does this work with other s3 implementations like minio?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection