Tuesday, December 9, 2014

Some thoughts on Continuous Delivery

Continuous Delivery is something a lot of companies strife for. It is a software development practice which allows quick (continuous) delivery of software. Quick delivery means the software can quickly provide business value. Why is it difficult to achieve and what are the challenges which need to be faced? Inspired by a Continuous Delivery conference in the Netherlands and personal experiences, some personal thoughts on the subject. The bottom line is that it requires a cultural change in a company and it is a joint effort of several departments/disciplines to make it work. The below image is taken from here. The Continuous Delivery maturity model is an interesting read to understand what Continuous Delivery is and provides a way to measure where you are as a company.

What has changed?

Current day, software development has changed much compared to let's say 20 years ago. First I'll describe some of the current issues. Then I'll provide the often obvious (but curiously not often implemented) solutions (mostly in the context of Continuous Delivery).

Changes for the business

Speed gives a competitive advantage. This is more so then in the past since it is easier for customers (due to the internet) to find, compare and go to competitors. It has become normal to not go to the local store anymore by default. Especially since customers start to realize they can save money by switching regularly (of course this is less so with governments).


Changes for developers

More frameworks
There used to be only the choice of vendor and integration / portability were not really issues because people tended to work in isolated silos. Currently however, application landscapes are made up of multiple technologically diverse systems integrated with open standards. The choice of software to implement a solution in is not as straightforward as it used to be. 'we use Microsoft, Microsoft has one product for that so weĺl use that!'. Today it's more like; we have a problem, what is the best software available to fix it?

This change requires a different type of architects and developers. People who are quick learners, flexible and are able to make objective choices.

In my experience, as an Oracle SOA developer, I should also have knowledge of Linux, application servers, Java, Python (WLST) and my customers appreciate it if I can also do ADF. Of course I should be able to design my own services (using BPEL, BPMN, JAXWS or whatever other framework) and write my own database code

More integration
Since systems become more and more distributed and technologically diverse, integration effort increases. For example, if in the past Oracle and Microsoft were living in their own distinct silo;
It is now not so strange anymore (because of open standards) to have an Oracle backend with a Microsoft frontend working together.

Integration also translates in integration suites becoming more popular such as Service Bus products, BPEL, BPMN engines which help automating business processes over applications/departments. The below screenshot is from Oracle's BPM Suite.


More security
Security is becoming increasingly important. Security on network/firewall level is the responsibility of the operations department but application and integration security is part of the developers job. Especially when the application is exposed externally, this becomes important.

Continuous Delivery is becoming a topic
Because the complexity of environments has increased, so has the complexity of the installation and release process. More companies start to realize this is something which is often a bottleneck. Classic delivery patterns are not suitable for such complexity and do not provide the delivery speed and quality which is required. For developers tools like Jenkins, Hudson, Bamboo, TFS, SVN, Git, Maven, Ant (and of course long lists of test frameworks) are becoming more part of daily life.

Changes for operations

Systems are becoming more diverse
In 'the old days' an Oracle database administrator would 'only' have to know the Oracle database. In present day, he is also expected to know application servers and know his way around Linux. He may even get requests to be the database administrator for a Microsoft database.


He is confronted with all kinds of new tools to roll out changes such as Jenkins, Hudson, Bamboo, XL-Deploy, SVN, Git, Ant, Maven, etc. Just having specific knowledge, will not be enough for him to keep his job until his retirement so he needs to learn some new things.


Distributed systems require new monitoring mechanisms
Systems have become more and more distributed. Monitoring becomes more of a challenge. For example, the database can be up and running, the application-server can be up and running but the application cannot access the database. What could be wrong? Well, the database might have been down (some companies still do offline backups...) and the datasource configured in the application-server has not recovered (yet?). In order to detect such issues, you need to monitor functionality and connectivity instead of individual environments.




Security is becoming more important.
Companies slowly start to realize that security is also a major concern. To be secure, it is important to be thorough and quick with security updates and patches. Also it requires more advanced monitoring and intrusion detection. A plain old firewall alone will not suffice since visitors need to access resources from within the company (for example in case of self-service portals). They need to be allowed to access them to allow certain functionality. When they are in, they can do all kinds of interesting things.

Downtime is expensive and hurts your reputation
Being responsive when something goes wrong, is not good enough anymore. It will cost you customers. Also, when a problem is found, you are usually already too late. Proactive monitoring is required. If it is possible to prevent a problem, it is usually less expensive then waiting for a disaster and trying to fix it when it happens.

 (I borrowed this image from an inspiring presentation by Mark Burgess)

Test

In order to make Continuous Delivery work, test automation is a must. The role of the technical tester becomes more important (since manual work is error prone and likely to give low coverage). Tests must be environment independent, rerunnable, independent of the dataset. During acceptance testing, if the acceptance criteria are well automated in tests, when they work on the acceptance test environment, it's save to go to production. Manual testing is not required anymore. This is also a matter of trust.

How to make things better

Of course the below suggestions are obvious but surprisingly, a lot of companies do not implement them yet.

Optimize the cycle time!

Dave Farley gave a nice presentation in which he mentioned a measure of performance for a software project; the cycle time. This measure provides a nice illustration on how thinking about the software delivery process should change and what the bottlenecks are.

The cycle time is the period it takes from an idea to provide actual business value. For example, marketing has thought up a new product. Most profits could be gained from this if it can be implemented quickly. If the cycle time is too high, competitors might be first or the idea might not be relevant anymore.

The cycle time can consist of steps like;
- business: new idea
- business: marketing research, will this work? business case
- business: decision making, are we going to do this?
- architecture: stakeholder analysis, non-functional and functional constraints
- design: how should the system work?
- operations: which and how many servers should be installed? which software versions?
- development: creating the system/application/feature
- test: is the quality good enough? if not, iterative loop with development
- development: providing operations with an installable package
- operations: running production

It becomes clear that optimizing the cycle time requires an efficient process which usually spans multiple departments and involves a lot of people.

Companies usually suffering from long cycle-times are companies who have implemented a strict separation of concerns where for example development is split in frontend, middleware, backend (of course all with their own budgets and managers), operations is split in hardware (physical and virtual), operating system (Windows, Linux), database (Oracle, Microsoft), application server, etc. If such a company implements quality gates (entry and exit criteria), this problem becomes worse. Quality and control is not gained by such a structure/process.

It is easy to understand why cycle-times in such companies usually are very low. In such a structure, there is little shared responsibility to get a new feature to production. Everyone just responds to his or her specific orders. Communication is expensive and it needs a lot of managing and reporting.

Organisation structure

Don't separate development and operations.
Separation between development and operations is not a good idea. To get a new piece of software running in production, they need each other. Developers need environments and prefer minimal effort in having to maintain them. Operations prefer installations without much problems. If installations are not automated, operations can help the development team to write a thorough installation manual or automating steps. Also it helps if the operations people have a say in requirements since it allows monitoring for example to be done more specifically. Being physically close to each other reduces the communication gap.

Work together in cross functional teams
This reduces the communication time, time required to manage over the departments, makes the discussion of who is responsible a lot easier (the team is) and reduces the tower of Babel effect. Cross functional meaning business, development, test, operations together in single small teams with responsibilities for specific features including running production with them (BizDevOps). Take into account that feature teams tend to overlap in the code they edit, thus some coordination is necessary.

Use stable teams
The feeling of responsibility increases when the team who build it is also responsible for keeping it running in a production environment. The team who build it, know best how it works and problems can easily be solved. Be careful though not to become too dependent on a specific team. Also the people in the teams will still require to talk about methods and standards with people from other teams to make company wide policies possible.

Test

On the Continuous Delivery conference, there was a nice presentation on approval testing. If for example I put a funny image on the site, will automated tests detect it? It is likely that they will not because no tester will have put in a specific assertion to detect this error. The approval testing methodology uses expected output converted to text in order to compare the produced output. PDF documents and images can both be converted to text and it is easy to compare text with tools like TextTest. Thus if a site contains a funny image and the image is converted to text, a text compare will detect it. This method also requires less work since not every assertion has to be written. A drawback is that it requires some additional effort on maintaining the tests since every change to the output needs to be approved. Coverage is a lot better though.

Most applications are outwards facing. This implies security is a major concern. This should be well tested. Ethical hackers are quite suitable for the job. Don't let them look just at the production code since then it will be too late! If they are involved early in the development process, it will provide useful insight on what should be re-factored before going live (see the flood picture before).


Development

Create modular software which allows updating without downtime
Very obvious and of course a best practice. However for example your J2EE web-application which is deployed in a single EAR is not modular since it is not possible to replace a component without having to redeploy the entire thing. The idea of microservices has some nice suggestions to make services independent and modular. See for example: http://martinfowler.com/articles/microservices.html


Use open standards
When a new component or application is added to the landscape, it is easier to link it to the already existing software. Also vendor lock is reduced this way. For internet sites it is nice that it looks the same in every standards based browser.


Operations

Keep the number of environments to a minimum
This is especially true if they are hard to create and expensive to maintain. If a certain environment is difficult to keep stable, try to identify cause and work on that instead of fighting symptoms. Introducing more environments increases the problem. If you are proficient in creating environments (you can create one in a matter of minutes) and maintaining them, of course this is not a problem and you can use as many environments as you see fit.

Automate everything

This is a joint effort of test, operations and development. Sometimes companies do not acknowledge this as a seperate task which needs investment since it does not seem to provide direct business value.

Environments, configuration, releases, deployments, patches
Especially environments and configuration should be automated but also releases, deployments and patches. People make errors and get bored of repetitive actions (at least I do). Fixing errors takes a lot more time then making them.

Automated provisioning makes it easier to deploy patches to multiple environments. This allows faster installations when security patches become available. If you want to manage security updates, do it thoroughly. If a Windows security update gets installed half a year after it is released... well, you get the drift.

Results in reduction of cycle time, increase in quality and more security
It reduces the time operations needs to create a new environment, reduces the time developers need to fix errors in the environment configuration, reduces the time testers require to check if the environment is setup correctly (if it is correctly automated, you don't need to check) and last but certainly not least, it reduces the frustration the business experiences because it takes long to create new environments and the quality is poor.

Conclusion

Many of the things in this post are obvious. I've mentioned several challenges and some solutions to help in eventually getting to the goal of Continuous Delivery. It does require a change in culture though to make it happen. As well formulated by the CIO of the Dutch ING bank Amir Arooni;

(in Dutch)
From "Kan niet", "Mag niet" and "We doen het altijd zo" to "Kan", "Mag" en "Het gaat echt nu anders"

It translates to something like;
From "Not possible", "Not allowed" and "We always do it like this", To "You can", "It is allowed", "We really are going to do it differently."