Should you move to Maven 2?

(A guest post kindly written by EJ Ciramella)

I've seen many a company try to migrate from Ant to Maven with varied success.  There is a change in mindset that has to come about when making the transition.  Here are some of the highlights.

Standard Findings

Monolithic build structure

This is the first big shift in thinking.  Typical (obviously, not ALL) Ant projects work by syncing some massive amount of code from source control, CD-ing into some top level directory and then telling Ant to build just the module you plan on testing.  What's so bad about this, you may ask?  Well, for one, you're likely syncing large amounts of code that you'll never run locally.  You're probably building up (and unit testing, right?) packages that don't change or the rate of change is very low as well.  Ideally, these packages and libraries would be built for the user already.  Now look at this through, say, a webdev's eyes.  If your webdev group is responsible for things like CSS, HTML or JSP changes, why should they be concerned about building up your oodles-of-utils package?  Or, if a unit test starts failing on them (you're unit testing, right?), why should they have to dive in and figure out what's missing or broken.  In a perfect world, any tier of development could be substituted for another (how great would it be if everyone knew everything?).  In in the real world, the one with interruptions and families and deadlines, that's unrealistic (especially in larger companies).

So decommissioning a monolith a few modules at a time is the best thing to do, once you've decided to go the Maven route.  There are two ways of doing this work.  You can take the atomic, all-or-nothing approach; going directly from a monolith to a more modular code base in one fell swoop.  If you can get sign off on this, then this is a wonderful thing.  I've had to restrain both myself and others from biting off too much.  What I like to do is pull a few things out at once, maybe three to four modules.  Of those four, let three be low cycling libraries and one a high cycle library.  This way, people learn the new location of the parts that make up your libraries that are combined to make your deployable unit.  Think of it more as an evolutionary process versus a revolutionary process.

Having smaller bite-sized chunks is also a better way to get to know Maven.  If you introduce people to a massive monolith with customizations all over the place with a dozen attached assemblies, then people are going to poke at it with a stick and hate it quickly.  Clearly seeing how a web application goes together and the resulting artifact is created is much more digestible, and you'll get fewer complaints about your Maven implementation.

Need everything to build always (as it is all always changing)

Another fear that emerges as people start considering modularization is with multiple deployable units, how does anyone know what is compatible with other internal code when the process isn't always building the same thing all time?  Well, that can be answered a few ways, but the simplest answer is once a library is released (or otherwise frozen) for a deployable unit, then that deployable unit need not upgrade its version of this library.  If shared functionality in the library changes, then you will have to retest, but that begs the question - is your application code in the right module?  Shouldn't a shared module stay somewhat generic and each deployable unit extend/implement those features instead of baking-in that logic at such a low level?  I've found that over time, if you make the library a separate module from the larger deployable unit builds, the code starts migrating in the correct direction versus where ever it's easiest to add it (no more massive search/replaces in the code base via an IDE).

Confusion with regard to building artifacts

Once everything is pulled out, there may be confusion on the developer's part with regard to which modules should be built in which order.  This, to me, is an educational thing.  At any point, the developer can run a "mvn dependency:tree" and see:

- What dependencies make up their project
- Where those dependencies were resolved from
- What order they are needed to be built in

When moving from a world where people operate at a very high level directory and build everything to a world where every module is very light weight and each move is a tactical one, people often don't know how to get that app server up or that daemon running locally. With every application as its own standalone build, people just need to sync what they want to run and rely on a repository manager for the rest (app server bits, database bits, etc).

Shortcuts to repository management

A repository manager is part of the Maven 2 process, end of story.  Trying to use a corporate file share or keeping everyone working in offline mode is just not the Maven way.  Using a repository manager also helps to minimize configuration people will have to manage locally in their settings.xml as well as help enforce the Maven way of life (banning redeploys, pushing releases to one repository and snapshots to another, not deleting artifacts, etc.).  With something like one of the big three (Nexus, Archiva, Artifactory), you simply have a grouped repository everyone points at.  That "grouped" repository is a representation of all the other repositories your company will use.  This way, you can have something like this:

 
     
         nexus-test
         *
         http://server/nexus/url
     
 
 
     
         nexus-test
         
             
                 central
                 http://central
                 true
                 true
             
         
         
             
                 central
                 http://central
                 true
                 true
             
         
     

And that's it.  This one setting covers every remote repository we use - from Codehaus to Repo1.  If you're really ambitious, although Sonatype doesn't recommend it, you can even tidy up the url so if you switch repository managers, devs don't need to touch their settings.xml file.  While this configuration can be rolled into the MAVEN_HOME/conf/settings.xml file, I personally like to keep my configuration maven-version-independent (putting this in my user home directory's version of settings.xml).

Custom things done in Ant that (is thought) can't be done in M2

Everyone has one little dark corner of their build world.  Usually it was some quick hack to make things work through Ant.  Possibly a custom task or maybe some shell-out, or something crazier.  These little dark corners should have light shed on them, in fact, flood them with light.  Instead of letting this be a choking point, start by looking at the common repositories for plug-ins that do what you're looking for.  There are very few problems someone else hasn't already solved and even if you searched a while back, a solution may exist now that didn't then.  In the past, I've done exhaustive searches and found no plugin that suited my needs.  Then I find some plug-in has changed a few months after to do exactly what I was looking for, or someone wrote one and contributed it back to google/codehaus/repo1.  If that route fails for you, just build a Maven 2 plug-in and deploy it to your local repository.  You can even have a transition period where Maven calls Ant to do just this little bit, then move the Ant tasks inside of Maven 2, then finally migrate to a Maven 2 plug-in.  Don't use the argument that "you should be writing code, not a Maven 2 plug-in".  Do you want your system to be robust and clear when there are successes and failures?  Then write the plug-in.  You can start quickly by typing "mvn archetype:generate", then select "maven-archetype-mojo" (option 12 as of this writing).

Other things to consider when moving to Maven

CI compatibility - some are designed around Maven

The original cruise control's Maven integration was very poor (I'd say the opensource version is still pretty bad).  It doesn't understand the different life-cycles or the output from each.  Hudson understands the life-cycles, and will inherently do things depending on what it sees in the build output.  So far in my travels and exploration of various CI servers/tools, Hudson is head and shoulders above the rest with regard to Maven integration.  Have site output you'd like to share?  Maven can publish that quickly with a link off of your project's page.  You have artifacts you'd like made available to another downstream job (or later process)?  Hudson picks up on those artifacts and tucks them away (maybe/maybe not to your liking).  All other products need to have these various things called out, you need to tell them, "look here for this tar.gz file" versus knowing based upon what Maven has logged.

Lack of understanding of how to upgrade (from one version of Maven to another)

Here's another big disconnect, you can't just fling a new version of Maven down like you could with Ant.  With Ant, you could generally look at the release notes, then install and add your custom tasks and then build.  With Maven 2, for the most part, you're protected from a lot of things, but you also need to watch for plugin versions, core changes to dependency resolution, etc..  I personally sleep better installing locally and building, then diffing against the artifacts that are generated by the build server.  Some changes to Maven (2.0.5 to 2.0.6) required users to review their dependencies for example.

Should you move to Maven 2?

Well, that question is best answered by you, dear reader. If you can modularize your codebase, then you'll see the biggest improvement both in development time (better throughput) as well in stability - no more broken unittests that when fixed, reveal more broken unittests ultimately convincing all developers to turn them off.  If you can't (or don't see any benefit), I'd submit your development team isn't mature enough to realize the many benefits to a highly modular codebase.  In the end, you have to choose what gets product out the door.

(image via ePublicist)

DevOps New Zealand