Settings

Theme

AWS CodeArtifact: A fully managed software artifact repository service

aws.amazon.com

167 points by rawrenstein 6 years ago · 91 comments

Reader

WatchDog 6 years ago

This has been a fairly obvious service that has been missing for a while, nice to see them provide a solution.

Most dependency management tools have some kind of hacky support for using S3 directly.

Full fledged artifact management tools like Artifactory and Nexus support S3 backed storage.

Interesting to see that the pricing is approximately double that of S3, for what I imagine is not much more than a thin layer on top of it.

  • ludjer 6 years ago

    Considering the price of Nexus and Artifactory this is way cheaper for a SAAS offering with SLA's. I imagine Artifactory is really going to have to up their product offering or at least lower their entry prices.

    • hn_throwaway_99 6 years ago

      Github already released their package repo last year (and have since purchased NPM). If anything I imagine that had Artifactory pretty scared vs. this. If your company already uses GitHub it's a hard sell to say why you'd need something like Artifactory over the Github package repo.

    • owenmarshall 6 years ago

      Meta: I’ve vouched for your killed comment. I suspect you may be shadowbanned.

      • ludjer 6 years ago

        Thanks for letting me know, I will reach out to the HN email and ask them why. I suspect it is because of some comments where I got -Karma.

        • SeanDav 6 years ago

          Looks like a kind mod un-shadowbanned you. Welcome to the land of the living!

          • swyx 6 years ago

            despite appearances i'm a very casual HN reader and all this talk of shadowbanning makes me kinda nervous tbh. hope i havent done anything to displease the powers that be.

            • SeanDav 6 years ago

              You are fine. New people with low karma are most at risk. Once you are a little established, you have to do something very upsetting to get shadowbanned, or be consistently unpleasant. Once established, a few controversial posts with negative karma should not be a problem.

              Avoid criticizing HN staff or related companies. Gentle / kind disagreement is fine, but err on the side of keeping it private.

  • mcrute 6 years ago

    > Interesting to see that the pricing is approximately double that of S3, for what I imagine is not much more than a thin layer on top of it.

    There's a lot of necessary complexity in the backing platform. Encrypted package blobs are stored in S3 but there are a bunch of other distributed systems for doing things like package metadata tracking and indexing, upstream repository management, encryption, auditing, access control, package manager front-ends, etc... that are not immediately obvious and add cost. The platform that backs CodeArtifact is far from what I'd call a thin layer on top of S3. There is also a team of humans that operate and expand the platform.

    Source: I lead the technical design for the product as well as a chunk of the implementation but left the team around mid-2018.

  • djhaskin987 6 years ago

    To add to your list of Artifactory and Nexus, Pulp[1] is also a cool project in this space, and is fully open source.

    Honestly the fact that they only support javascript, Python and Java is pretty bare bones compared to what the others on the above list support, and again as you say, for a fairly high price.

    1: https://pulpproject.org/

  • StreamBright 6 years ago

    We have used S3 successfully several times. You can create a Maven repository, use it as RPM repo and many other use cases to host artifacts. I am not sure what functionality is missing that cannot be implemented on the top of S3 and requires CodeArtifact.

    • cremp 6 years ago

      For maven, to push artifacts via the correct mvn deploy:deploy-file requires a S3 wagon (transport layer) software to actually make the S3 calls. For bigger orgs, having everyone use a wagon is a non-starter.

      All I'm seeing this does is give the proper http endpoints so you dont need the wagon. Is it worth ~2x the price, no, but it's better than the other enterprise-y solutions.

  • entee 6 years ago

    > Interesting to see that the pricing is approximately double that of S3, for what I imagine is not much more than a thin layer on top of it.

    Haven’t looked carefully, but is there a difference in the guarantees it provides? Might be a performance or SLA difference.

antoncohen 6 years ago

The login credentials expire after 12 hours (or less)[1], just like with their Docker registry (ECR). That makes it pretty annoying to use, especially on developer laptops.

GCP has a similar offering[2]. And GitHub[3].

[1] https://docs.aws.amazon.com/codeartifact/latest/ug/python-co...

[2] https://cloud.google.com/artifact-registry

[3] https://github.com/features/packages

  • blaisio 6 years ago

    I could not disagree more re. the expiring credentials. It is a bad practice to have credentials that never expire, especially on developer laptops, especially credentials of this nature. Developers frequently store this stuff in plain text in their home directory or as environment variables. That's a huge security risk! This service manages the process of generating and expiring credentials automatically, which is awesome.

    • antoncohen 6 years ago

      This service is for code artifacts. What credential to the developers use to access source code? Do they expire?

      It is common for developers to use Git to store source code, in a hosted service like GitHub. It is common to use SSH keys to access Git. Frequently those SSH keys are generated without passphrases. Those are non-expiring credentials stored on disk. If HTTPS is used to access Git, it will likely be with non-expiring credentials.

      I'm not saying short lived credential are bad, not at all. I'm pointing out how this service differs from similar services, requiring a change it workflow, which might be annoying to some people. Not everyone is operating under the same threat model.

      • tiew9Vii 6 years ago

        Your source code may reference a shared library at a specific version from a trusted source to build. This trusted source is CodeArtifact.

        The short lived passwords is a non issue and a good thing. Your dependency resolver should handle fetching the new password and most orgs I’ve worked at had scripts dealing with short lived passwords/iam.

        • antoncohen 6 years ago

          > Your dependency resolver should handle fetching the new password

          According to AWS's documentation, none of the supported dependency resolvers will fetch the new password[1][2][3].

          If they were capable of automatically fetching the new password without human intervention, it would mean they have credentials for generating credentials. If this isn't on an EC2 instance (where an IAM role can be used), that means there are long-lived credentials (probably written to disk) used to generate short-lived credentials.

          This would be the case if you are using a hosted CI service that doesn't run on your own EC2 instances. You would probably be providing an AWS key and secret, which would then be used to generate the short-lived credentials. But the key and secret won't be short-lived, and will have at least the same access as the short-lived credentials (probably more access).

          > Your source code may reference a shared library at a specific version from a trusted source to build. This trusted source is CodeArtifact.

          HTTPS is what forms the trust between you and the artifact repository. Short-lived passwords don't do anything to ensure you are talking to the real trusted source. They may make it so the artifact repository can better trust you are who you say you are, but I don't see what they has to do with safely getting a specific version of a library.

          [1] https://docs.aws.amazon.com/codeartifact/latest/ug/python-co...

          [2] https://docs.aws.amazon.com/codeartifact/latest/ug/npm-auth....

          [3] https://docs.aws.amazon.com/codeartifact/latest/ug/env-var.h...

          • shortj 6 years ago

            > that means there are long-lived credentials (probably written to disk) used to generate short-lived credentials.

            In terms of local development experience, most mature organizations will have these "long lived" credentials still require an MFA at a minimum of once per day and locked down to particular IP addresses to be allowed to get the temporary credentials.[1]

            > This would be the case if you are using a hosted CI service that doesn't run on your own EC2 instances.

            Typically you'd want to see third-party platforms leveraging IAM cross-account roles these days to fix the problem of them having static credentials. Granted, many of them are still using AWS access key and secret.

            This is still not a "solved" area though, and a point of concern I wish would get more aggressively addressed by AWS.

            [1] https://github.com/trek10inc/awsume, https://github.com/99designs/aws-vault, and a few other tools make this much easier to deal with locally.

    • nickjj 6 years ago

      > I could not disagree more re. the expiring credentials. It is a bad practice to have credentials that never expire, especially on developer laptops, especially credentials of this nature.

      For the specific use case of the developer box and the Docker registry, resetting the credentials every 12 hours doesn't offer any more security than not on its own.

      The reason for that is after you try to login to ECR after the expired time, the way you authenticate again is to run a specific aws CLI command to generate a docker login command. After you run that, you're authenticated for 12 hours.

      If your box were compromised, all the attacker would have to do is run that aws command and now they are authenticated.

      Also, due to how the aws CLI works, you end up storing your aws credentials in plain text in ~/.aws/credentials and they are not re-rolled unless the developer requests to do so. Ultimately they are the real means for Docker registry access.

      • nerdjon 6 years ago

        Those credentials sitting in ~/.aws/credentials should also expire after 12 hours. There are plenty of tools out there to automate this process so you just log in with Okta or similar tool in your CLI and your done (bonus they also make switching between accounts a lot easier).

        There is absolutely no reason with the tools that we have available that we should be creating long living AWS keys. It's a major security risk if those keys ever got out.

    • thayne 6 years ago

      > Developers frequently store this stuff in plain text in their home directory or as environment variables

      If you care about the security of these artifacts, why is their home directory (or their full disk) not encrypted? If they have access to the repository, they probably have artifacts downloaded on their laptops, so if the laptop is compromised, the artifacts are compromised anyway.

      Edit:

      Not saying temporary credentials are bad. But the reasons you gave seem a little suspect to me. A better reason is that you don't have to worry about invalidating the credentials when an employee stops working for you.

      • sudhirj 6 years ago

        The problem isn't encryption, let's assume everyone has full disk encryption turned on, so someone who steals your laptop can't access your data.

        The problem is that your home directory is accessible to a ton of apps on your computer, and you have no idea what each of them is doing with that access. You also have no idea if any of them can be / are being exploited. The most recent case being Zoom – if that server they had running on localhost and responding to anyone on the laptop had file system access APIs (which is reasonable if Zoom had offered file sharing on calls) an attacker would have been able to read all your credentials.

        • Bnshsysjab 6 years ago

          If you have an app on your computer that is controlled remotely you have _massive_ issues. Creds are stored for SSH, browser, probably heaps of other things too. If this is a serious security concern within your threat model you should be auditing every single package or isolating (docker, vms, Bare metal if you’re super tin foiled), anything short of that is fake security.

          • txcwpalpha 6 years ago

            >Creds are stored for SSH, browser, probably heaps of other things too.

            And ideally these credentials should have similar controls applied around them as well (only temporary, using passwords to unlock the SSH keys, etc). If you don't have that, that's your choice, but just because some of your credentials lack security controls is not a reason for other credentials to lack security controls, too.

            > you should be auditing every single package or isolating (docker, vms, Bare metal if you’re super tin foiled), anything short of that is fake security.

            Which is exactly the reason that many orgs do specifically audit every package and disallow unapproved software. But again, even if some of your desktop apps are allowed unaudited, that is not reason to lessen your security elsewhere.

            • Bnshsysjab 6 years ago

              There’s a very limited set of scenarios where local file read isn’t accompanied by enough write/exec privilege to inject a keylogger. Sir, there might be some cases where the control would prevent abuse but they’re limited. IMO time/money should be invested in other security over anything more unless you’re literally nearing an absolutely secure environment. In most cases I’ve seen there’s gaping holes while crazy amounts of time and money are spent securing something that doesn’t actually improve overall security much or at all.

        • thayne 6 years ago

          In that case, the rogue app would have access to your temporary credentials anyway...

          • txcwpalpha 6 years ago

            Yes, and of course that is bad, but it is not as bad as a rogue app having access to infinite life credentials.

  • toomuchtodo 6 years ago

    You should have a shell alias to rapidly top up your auth token, just like with the Docker ECR. Short lived tokens are best practice, and a 12 hour TTL is reasonable. That’s no more than two auths in a day as a dev.

    • antoncohen 6 years ago

      And every developer needs to have that alias. And all automation needs to be changed to call that command before trying to use pip, or mvn, or whatever. It sucks. No other hosted artifact repository does this.

      • toomuchtodo 6 years ago

        It’s roughly a dozen lines of bash (error handling and all), speaking as someone who has had to maintain dev tooling for an org where Docker ECR was used, and can be checked into your project’s repo. It’s not onerous at all, either on devs or your build and deployment pipelines/runners.

      • cle 6 years ago

        If only devs had a way to share code with one another...

  • code4tee 6 years ago

    Can’t imagine any serious tech environment still allowing non-temporary creds. If they do, good luck when the security audit happens.

  • kccqzy 6 years ago

    Why so? You just log in once a day at the beginning of your work day. I don't think you'll work a 12-hour day so that should be good for the entire day.

pskinner 6 years ago

Is it just me or is this missing plain artifacts - those that are not packaged for a specific tool? I'm thinking of plain binaries and resources required for things like db build tools and automated testing tools - just files really. How do I publish a tarball up to this, for example?

Also the lack of nuget is a major issue.

  • greyskull 6 years ago

    I think CodeArtifact loses value when you aren't using a package manager; the benefit is an api-compatible service with various controls and audits built on top.

    Out of curiosity, what would you want from this service for the "plain binary" use-case when S3 already exists?

    • pskinner 6 years ago

      I think mainly the ease of having security dealt with around who can access etc really. Ofc you can just upload files and serve them over http, but I'd like something that's as easy to setup and use as nexus for these files - and something that forces a structure for how they are organised. Stops arguments and people doing whatever they want.

      • StreamBright 6 years ago

        >> I think mainly the ease of having security dealt with around who can access etc really. Ofc you can just upload files and serve them over http,

        This is where S3 really shines. You can give developers access through group membership while servers using instance profiles. We have implemented a fine grained access control for the S3 repos that works really well. Of course you access the content via HTTPS.

        • pskinner 6 years ago

          Fair enough, I dislike having the idea of having disparate systems where one type of the same thing is stored on a different system from a second type of the same thing.

          IAM is on the AWS repo aswell isn't it? I guess it wouldn't be so bad then.

    • ec109685 6 years ago

      It’s nice having the metadata around the push available versus raw blobs to s3.

tkinz27 6 years ago

It’s frustrating to not see more system package management (deb, rpm) from these new services (github and gitlab for instance).

Are others not packaging their code in intermediate packages before packing them into containers?

  • manigandham 6 years ago

    What's the purpose of intermediate packages if you're already using containers?

    • tkinz27 6 years ago

      Very large c++/python/cuda application that is packed into various different images (squashfs images, but functionally the same).

      We end up having a lot of libraries that are shared across multiple images.

      • manigandham 6 years ago

        Would it not be easier to just pack into different base images? Docker is very efficient with reusing these layers.

    • Jtsummers 6 years ago

      Intermediate packages permit you to choose different deployment situations later, with minimal additional cost now. Tying everything to Docker images ties you to Docker and removes your ability to transition to other systems. It may not be worth the cost now, but as soon as you want to deploy on more than one platform it can become critical to maintaining momentum (vice having to hand tailor deployment for each new environment).

  • asguy 6 years ago

    We've been going that direction. Packages integrate better into multiple use cases (e.g. VM images, containers). Running a properly signed apt repo is easy these days, so why not?

    For people that disagree with this model: where do you think the the software comes from when you apt/apk install things inside your Dockerfile?

  • blaisio 6 years ago

    Most people don't need to do that. You can build things you need as part of the image build. No need to setup a deb or rpm package unless you're also installing it that way somewhere else.

  • secondcoming 6 years ago

    We use jfrog. One jenkins job builds our code into a .deb and pushes it there. Another job builds the VM image which is then deployed once testing passes.

  • wmf 6 years ago

    That sounds like double work.

FrenchTouch42 6 years ago

I'd like really like to see more support added (Ruby, etc). It could be a great alternative to Artifactory.

scarface74 6 years ago

No C#/Nuget support? Really?

  • tkahnoski 6 years ago

    AWS products always take an MVP approach. The rest is driven by customer feedback on the roadmap. CodeGuru/CodeProfiler/X-Ray are similar to limited language support they've built out over time.

    Whenever I see a product announcement like this missing something I need to use it, I immediately ping our Technical Account Manager to get the vote up for a particular enhancement.

    • swyx 6 years ago

      sounds surprisingly manual. has AWS not tried to formalize some sort of feature voting system?

      • tkahnoski 6 years ago

        Some products have started doing public github “roadmaps”. Use github issues to get more accessible public feedback but who knows how that gets processed internally.

  • mcrute 6 years ago

    The back-end is largely package type agnostic and the package manager front-ends are pluggable. I'd look for AWS to expand package manager support in the near future. Nuget was on the list along with a few other popular package managers. There's a whole lot of functionality in the platform they didn't yet expose or have finished for the launch, I'd keep an eye on this as they move forward.

    Source: I lead the technical design for the product as well as a chunk of the implementation but left the team mid-2018. I don't have any specific insight into their plans, not that I could really share them even if I did.

  • politelemon 6 years ago

    That is strange, I wonder if that's coming later but I didn't see anything to that effect. I'd also have liked to see docker image support (despite ecr) and raw binaries too.

    • jen20 6 years ago

      My guess (purely a guess though) is that this is a good proportion of the platforms AWS use internally, and that this service will expand to other ecosystems less used internally in response to customer demand.

  • StreamBright 6 years ago

    Weird when you can just do it using S3 for 50% of the price.

    https://github.com/emgarten/sleet

    • scarface74 6 years ago

      Static feeds are much slower than one that use a real server.

      • StreamBright 6 years ago

        Any reason why? Could we not make it faster?

        • scarface74 6 years ago

          It’s been awhile since I tried a static feed. But basically, the client NuGet command had to read the directory structure to find all of the NuGet packages and versions instead of using an API where the server had everything indexed already.

lflux 6 years ago

You know it's an AWS service when you look at it and go "Huh, it's only 2x the price of S3, what a bargain!"

  • setheron 6 years ago

    2x the price of S3 is very cheap.

  • weehack 6 years ago

    It dedupes artifacts (according to the twitch demo today) so actual cost would likely be much less than s3 unless you're doing a solo project.

andycowley 6 years ago

No deb, RPM, or nuget. Half a product really. As annoying and expensive as Nexus and Artifactory are, at least they're more fully featured.

soygul 6 years ago

Seems like a direct competitor for Artifactory and Nexus. I wonder if it is profitable for them to create an inferior alternative to fully flagged artifact managers. Or if they are doing this for product-completeness of AWS.

doliveira 6 years ago

I'd wait a few years to be ready, AWS developer tools are really crude. Last year I had to build a Lambda to be able to spit multiple output artifacts in CodePipeline.

saxonww 6 years ago

Appears to support ivy/gradle/maven, npm/yarn, and pip/twine only.

StreamBright 6 years ago

What is wrong with S3?

dahfizz 6 years ago

I don't get it.

The git server you use supports artifacts already. You could also just put all of your artifacts on an S3 bucket if you needed somewhere to put them, which is exactly what this is but more expensive. I don't understand when this would save you money or simplify devops.

  • cle 6 years ago

    It’s not “exactly what this is”. Every time AWS or Azure or GCP releases a service, there are a droves of people on HN decrying them as “just <something I’m familiar with>”, without bothering to understand if that’s actually true. It’s not.

    Skim the docs and you will see it is not “just S3”.

    • thayne 6 years ago

      yep. I've worked on a solution that "just uses s3". It is not trivial.

  • code4tee 6 years ago

    Can occur in a VPC without direct internet access. For the average developer this isn’t usually an issue but in highly secure corporate environments this helps a lot. Can’t just do pip install X in such situations. Even the S3 proxy solutions often require many hoops from the security Jedi council before you can use any packages there.

    A lot of people won’t find this useful but for some it’s a big blessing.

  • blaisio 6 years ago

    For python at least, fetching something from git is far slower than fetching it from pypi.

  • dmlittle 6 years ago

    The benefit is being able to keep your existing maven/npm/pip workflows as well as use the same workflow for both internal and public dependencies.

    • dahfizz 6 years ago

      I still don't see what's different. I can configure pip to look at my git server, so that all I have to do is `pip install my_thing` and it will automatically download all public and private deps. I don't know what you mean by "workflow" in this context but this is just about as simple as can be.

      • code4tee 6 years ago

        You’re not the target user here. In highly secure environments you can’t just “pip install your-thing”.

      • baq 6 years ago

        Looks like you’re assuming you have some kind of access to any part of the internet you please. I envy you because most tools just work in this case.

        Not so on enterprise networks.

  • rantwasp 6 years ago

    what git server is that?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection