Continuous Interviews: Jez Humble, Part II


This is part two of the interview with Jez Humble from ThoughtWorks Studios (part one here). I'm also getting it transcribed; watch this space.

JULIAN: Can you tell me about the XML?

JEZ: Yeah, that’s…

JULIAN: I was looking at Cruise the other day, because I was wanting to change who we are using at my day job, and yeah, I was quite surprised to see that after you get through the wizard of creating a pipeline, it sort of reverts to XML; what’s all that about?

JEZ: Yeah, so that has been a frequent source of complaint. So, there are some advantages to XML config, which is that it allows for simple, external configuration of the tool. So, you could, for example, send the configuration out via puppet and manage it centrally in version control, and it becomes much easier to kind of audit it and see exactly what the configuration was at a particular point in time. Obviously, it is extremely un-user-friendly. We have put some little things in there to help you with the XML config; so, for example, if you use the wizard in Cruise, it does not let you save the config; it is not valid. So, there are some bits that make it slightly nicer, but it is still horrible, obviously, and we are committed to getting rid of it. What we are planning to do, is to put wizard in, over the course of, you know, the next few releases. So, hopefully you will start to see that with 2.0 general release, at least some incremental move, and we will be moving everything over to wizard. What we decided to do, is to focus on delivering other functionality before delivering wizards. I mean, it is what has enabled us…

JULIAN: Understood, fair enough…

JEZ: Yeah, but I know that it is horrible.

JULIAN: Yeah, I mean, you absolutely need some kind of easily causable config file, and being able to push that out with some of the management symptom is brilliant. I mean, I do that on my NCS server, as all the config is managed under subversion, using a puppet. Fair enough; it is good to see that you will be moving along from that in time. I would like to talk about the build pipeline if you can. I mean, obviously, you and I have discussed this over the course of 7 years, but for anyone who has really just come along and has seen the **2:22**, can you kind of give a brief overview of that, and what it really does for you on a project?

JEZ: Sure. So, obviously, every project has this cycle of building, deploying, testing, and releasing, and all your teams would be involved in that. And really, the problem is, as we might have seen, it is very hard to get visibility into that process, and to manage it well. So, it is very typical for developers to throw things over the wall, and then for build dudes to have to come and kind of try and do manual deployments, and that takes ages, and then the developers don’t get feedback on the deployment process for ages, and then when they get bug reports, there are 2 versions behind the development, and the bugs are not relevant anymore, and then the operations people are up until 4 a.m. trying to deploy it. This is very typical, and this is the problem that we are trying to solve with Cruise; actually, to give everybody who is working on a project from operations to program managers to developers visibility into the status of every single check-in, which automated and manual tests it has passed, whether it has passed, performance tests, other kinds of nonfunctional tests, and to get control over that process, so that you can do click-button deploys. You can just press a button, and bang, it is in your manual testing environment; bang, it is in your staging environment, and, ultimately, pushbutton deployments into production. One of the interesting things that has being developing over the last few months is kind of continuous deployment, and I think Cruise is really tool that is absolutely designed with that in mind, the idea that any check-in, you can see the moment that tests are passed, and you can just press a button and have that deployed into production, straight away.

JULIAN: Okay, great. Um, the last question I was going to ask was about one other kind of side-effect I have seen as being able to have all these different stages of build, which is projects ending up with some kind of horrible automatic functional test which might run for a very long time. There was a major retailer project that was done at ThoughtWorks that had a function test suite that ended up running for hours, and I think it was nearly approaching days at one point. Do you have any comment on what you should try and do to avoid that? I mean, it is quite easy to generate yourself an enormous volume of functional tests, any kind at all, really, and trying to make that work with continuous integration then becomes difficult because you cannot really have builds typically that are, you know, 4 hours long. Any advice really for stopping that from just **4:58** build pipeline from perpetuating the **5:05**, allowing developers to push things further down the pipeline?

JEZ: Yeah, I mean, that’s actually right; I remember you pasted a blog based on this a couple of years ago, and it is an excellent point. Because, there is a tension, right? On one hand, you want a check-in suite that runs in under 10 minutes; on the other hand, you want really comprehensive automated acceptance tests to prove that, you know, the latest change really does work, and you have not broken anything. So, it is tricky, and as you point out, there is a tendency for the developers to pay attention to the check-in suite, and to ignore the acceptance test suite, and then you just end up with acceptance tests that are red all the time until you come to release, and then suddenly there is a flurry of activity to try to get in green, and half of them are thrown away, and then you are back to the initial problem that continuous integration is supposed to address, which is the fact that you spend ages integrating your code right at the end. So, yeah, I mean, it is a big problem. One of the things, and actually Cruise, one of our design constraints was to try and make that easier. So, one of the things that we make it very simple to do in Cruise is to throw resources at that problem. So, taking a log-running automated functional test and making it run across a grid of computers is very simple in Cruise. So, Cruise has the concept of stages, which are like stages in your build pipeline, and you can define many, many jobs in those stages, and hence, sign a part of your automated functional tests to each of those jobs, and run them in parallel on the build grid. So, what we do do is to make it very simple to make those test suites run very, very fast, by parallelizing them across the build grids. So, that is the number 1 solution. So, for example, Mingle has this automated acceptance test suite that takes 13 hours to run end-to-end. We have a grid of 60 computers sitting in Beijing, and when the functional tests run on those simultaneously, you get your results back in 45 minutes.

JULIAN: Nice.

J
EZ: So that is the kind of thing that is very simple to do within Cruise. I mean, it is still the case that developers can ignore acceptance tests. In a way, I don’t thing there is a technological problem to that. To some extent, product managers have to just, you know, give developers a bit of a kick, and say actually, you know, it is not done until the acceptance tests are done, you know, and you cannot sign off a story until the acceptance tests agree, which is hard to do in mid-flight. It is easier to do if you enforce it from the beginning. So, I mean, tools are not going to fix all your process problems, absolutely, but I think Cruise does make it easier to get fast feedback.

JULIAN: Sure, at the end of the day, it is about discipline, and this is an industry not always well-known for having it. So, that is all the questions that I have. Thank you for taking the time out to answer them, and is there anything else you wanted to say before we stop recording?

JEZ: Um, no; I think that, well, I guess the only think that I would say is that 2.0 is in the works; one of the kind of sneak previous I can exclusively give you is that we are redesigning the **7:58**. We did not give a lot of love to the **8:03** 1.0, and we are tightly re-doing that for 2.0. So, that is quite exciting, and hopefully people will like that and it will make it a lot easier to use. And, 2.0 is due out, and hopefully there is going to be an early access for people to check out in and around the May time-frame. So, I would encourage people to check that out, and obviously we will blog about it when it happens. And yeah, otherwise, it has been a pleasure talking to you. Thank you very much.

JULIAN: All right, thanks Jez


DevOps New Zealand