CI vs Zombies

(Thanks to Jason Sankey of Pulse for this Guest Post)

On behalf of the Zutubi team, I'm excited to announce the latest release of our continuous integration server, Pulse 2.4! Before I go into details, I'd like to touch briefly on one of the areas we've been working on for this release that doesn't get a lot of attention because, frankly, it's not very "sexy". It is, however, an important reality for anyone that has to maintain a build server. I'm referring to the termination of runaway builds.

A runaway build occurs when not all processes created by the build exit cleanly. Catching and killing all processes created by a complex build can be difficult. Processes that last beyond their intended lifetime - zombies - may hang the build, or simply stay around in the background waiting to wreak havoc. Zombies are a source of multiple headaches:

  1. They interfere with test isolation. If processes can hang around from an earlier build (or earlier test within the same build) they may affect unrelated tests. Lack of isolation leads to
    difficult-to-diagnose failures.
  2. Even if they don't directly affect other tests, zombies can build up into hordes that drain resources, eventually leading to exhaustion.
  3. Manual intervention is required to kill them and clean up. Any manual process is the natural enemy of continuous integration, particularly at scale.

The pain of zombies us something we do our best to eliminate with Pulse. Pulse has always had the ability to kill builds after a timeout, but not all builds cooperate. So in this latest release we've added some new process terminationl logic, leveraging platform-specific code and tools where we can.

First, I'd like to give a plug to a little library called javasysmon, from Jez Humble (our friendly Thoughtworks competition). Although incomplete and dormant for some time, this library provides a starting point for richer process discovery and manipulation through Java APIs. We've employed it as one prong in our attack against zombie processes. Given the Java platform's anemic support for process control, we're hoping to build on (and contribute to) this effort over time.

The combination of Windows and Java provides extra challenges for process control. Windows doesn't have true process trees, rather process groups are the preferred way to manipulate related processes. However, Java APIs provide no way to create or manipulate process groups. On the upside, Microsoft ship a handy utility by the name of taskkill with all recent Windows versions. Pulse has no aversion to employing external platform tools where they can help, so our zombie razing toolbox includes use of taskkill when it is available.

If a good flailing via Java APIs, amputation with javasysmon and evisceration with taskkill all fail to stop the zombie hordes, Pulse 2.4 also offers a shotgun-to-the-head fallback: the kill build action. This new action is the kill -9 of the Pulse world: it cuts the process loose and wraps up the build state in Pulse immediately. We'd prefer it never came to this, and will continue enhancing our automated weaponry to avoid it, but a last-ditch way to restore order is better than taking your build server down.

Zero zombie tolerance is just one of the improvements in Pulse 2.4. Other significant updates include:

  • Mercurial support: in the form of a new plugin.
  • Maven 3 support: including a command, post-processor and resource discovery.
  • Agents page updates: with graphical status and more convenient navigation.
  • Reworked agent status tab: with more build links and efficient live updates.
  • New agent history tab: quickly browse all builds that involved an agent.
  • Reworked server activity tab: showing build stages nested under active builds.
  • Pause server: admins can pause the build queue, so all triggers are ignored.
  • New server history tab: showing build history across all projects.
  • Restyled info and messages tabs: for both the agents and server sections.
  • Improved changelist views: these views have been reworked in the new style.
  • Pinned builds: mark builds that should never be deleted or cleaned.
  • Templated field actions: easily find or revert to an inherited value.
  • Introduce parent refactoring: adjust your template hierarchy over time.
  • Pluggable resource discovery: automatically locate build tools and libraries.
  • Subversion changelist support: easily submit a changelist as a personal build.
  • ... and more: extra UI touches, improved performance, more plugin support implementations and more.

If you'd like to learn more about Pulse 2.4, check out the new in 2.4 page on our website for details and a few screenshots. Or you can join the war on zombies by downloading and trying Pulse 2.4 for free today.

Disclosure: Jason's competitor sponsors this blog.

DevOps New Zealand