Repository? That's not a repository ...

Oct 10, 2011 • Julian Simpson

... this is a repository. I've been experimenting with Amazon S3 as a Maven repository. I threw the results up on GitHub [link].

Why? I've been working with some nice people who use binary dependencies. In order to scale their CI system past one node, they need a repository manager, to temporarily store built artifacts on.

They actually have a couple of repositories, but not geographically close to where I'm building out their new CI system (on EC2). It's important to keep the feedback loop fast; so deploying a repository close to(or indeed on) on the CI server is desirable.

The plan was to deploy Archiva onto an EC2 instance via Puppet. Then I had a better idea: use an S3 bucket as a highly scalable, low latency and (mostly) highly reliable repository.

Turns out that there's support for S3 in their build tool (Maven). So it turned out to be easy. In the github repo, I publish and artifact using an S3 client, and then retrieve it using plain HTTP.

As long as your EC2-hosted Continuous Integration server is in the same availability zone as the S3 bucket, you're not going to be liable for a high traffic bill.

They may want to use a different configuration for the real repository, where they serve artifacts to the other consumers. This approach is all about speeding up and scaling out their build.

Update: Oliver Lamy asks: 'what is wrong with Archiva'?

Nothing. I'd cheerfully use it. It has a permissive license and cheerful absence of public spats with other repo managers.

In this case I wanted some very fast and robust, which led me to S3. The alternative was to deploy Archiva on an EC2 instance: I could have done that, but then I would have had a dependency on a single host.

IMO, this approach is more suitable for the context (caching intermediate build artifacts, on AWS).