Monday, August 25, 2014

Database releases. How can we make things better?

Releases of Java (technology related) code often are easy. Releases of database code often are not. Why is that? An important difference and cause can be found in that database code tends to be incremental. Being incremental means that a database release is build on top of previous releases. A database release has to be incremental because often a production environment holds data which needs to be preserved. With Java code, the previous releases (EAR, WAR, JAR files and the like) are less important for the present release since usually they are overwritten. Of course Java releases have their own complexities, however often they do not hold data; that is the task of the database. In this blog I will describe some of the common issues and several suggestions to improve (database) releases. Even though I am a developer (with quite some experience in release automation), I tried to also describe the issues system operators are facing and provide suggestions to improve the process.



Why are database releases often difficult?

Incremental releases have to deal with (part of) the history of an environment. This puts several requirements on how to deal with releases and environments. These requirements are often not met, causing difficulties during the installation of a release.

Administration

Incremental releases require administration
Suppose you have several environments. Every 2 weeks a scrum team releases a new incremental release. This release needs to be installed on every environment eventually in order to provide preconditions for the next release (the team will probably not be sitting still the next sprint). If the installation is delayed and there are many environments, it is easy to lose track on how you can get an environment up to date again.

Separation of concerns

In contrast to the current DevOps movement, a lot of organizations implement a separation of concerns and have split development and operations. This can cause issues.

Developers

Developers don't know production topology well enough
Usually due to separation of concerns, developers have limited access to the production environment. Often they are not fully informed of production topology, directory structures, and other related things required to write a foolproof installation manual.

Developers don't know production data
Production data might be very different then data in a development environment. Certain data constructs users or operators might have managed to put in the data model in a production environment might be incompatible with the next release. Developers will say things like 'of course we haven't taken into account that specific situation since it shouldn't have been possible in the first place!'. Not only can the content of the data sometimes surprise developers but also the size. A certain query might perform very well in a development environment with two records, but the production environment might have 2 million records. If known beforehand, the developer might have made different choices concerning the implementation of a feature.

System operators

Manual changes and lack of communication
Often a release is installed for example on a production environment and changes are made to the environment afterwards. Such changes can be quick fixes to get the installation finished or manual solving of production issues.

These changes are often not communicated to the people creating the next release due to separation of concerns. The production system operator is often only concerned with keeping the production environment running and not with providing the right preconditions for the next release to be installed. If the development team is not informed about manual changes to the environment, they cannot take them into account for the next release, so probably the next release will cause problems since it is not known to them what the exact situation was before installation of the release.

A database administrator is often not educated in application server administration
Some people tend to think that a database administrator can just as well administer an application server. Even though sometimes this is true, it is not something which can be expected. It is a very different line of work and different knowledge is required. If a DBA is ordered to maintain an application server, usually the following happens
  • he requires a very thorough, complete manual illustrated with many screenshots of every action to be performed (requiring a lot of work to create)
  • he or she will do what he thinks best and a lot of rework/debugging is required afterwards to fix the issues this has caused
Certain actions are not documented by developers such as server installations, clones, setting up clusters and the like. If this is done by someone who has little understanding of what he is doing, this will of course also cause issues.

Different people
People installing releases are required to make choices since an installation is never perfectly documented or automated (in my experience). Not every operator installing releases makes the same choices and thus the end result of installing the same release on the same environment done by two different people will cause two different end results. Often the people responsible for installations on production are different people then the people responsible for the installations on a test environment. This will cause differences between those environments.

A single person
Letting a single person do the installation of a release on every environment reduces this risk, but causes a problem if that certain person is not available. Also, that person will have a lot of knowledge in his head that is not on paper, which can be a liability.

Script quality
Even when the communication about and administration of releases is perfect, sometimes there are still unexpected errors such as a disk full error. If a script fails halfway due to such an unexpected error, usually it is difficult to restart/continue the installation.

Script granularity and rerunnability
If a script fails halfway, someone might decide to rerun the script after the problem is solved, but is the script build to be rerunnable? The actions which have already successfully executed, what should be done with those? Different people might make different decisions on this causing different results when the installation of the release is finished. This can cause issues for the next release.

Environment independence
If for every environment a separate script is provided, what is the worth of your testing efforts? You are not testing the code that is going to be installed on the next environment.

Oracle specific

Grants and synonyms
  • Complexity is increased when users/schema's are woven together by means of grants/synonyms. It requires effort to maintain the grant/synonym code and keep it consistent among environments. Often this requires additional scripting.
  • If an object is dropped and recreated, grants are lost and need to be recreated.
  • An action in schema A can create invalid objects in schema B. These might not be noticed when only looking at schema A for an installation.
  • Synonyms are not understood by some (uses of) JDBC drivers.
  • Synonyms can become invalid.
Use of tooling
Database administrators tend to use SQLPlus to execute scripts. SQLPlus, called UFI (User Friendly Interface) in the past (see http://en.wikipedia.org/wiki/SQL*Plus) is a commandline tool to execute SQL against a database. It has some special SQLPlus specific commands to allow for example spooling output to a log file. Developers tend to use GUI interfaces like Quest TOAD, Oracle SQLDeveloper, Allround Automations PL/SQL Developer to do their database development. Because of this difference in tooling, sometimes it can occur that a script executes fine in the developers tooling and not in the system operators tooling. Different tools can interpret (and thus execute) the same script differently.

Suggestions to improve database releases

What can management do?
Management can provide a structure in which it is difficult to make some of the above mentioned mistakes and people are responsible for the process and the release quality. Pay special attention on how you want to separate concerns and what this can cause (see for examples above).

Configuration management
In order for a configuration manager to improve (among other things) database releases, he should keep an administration of the following;
- what is the status of a release? is it safe to install it on a certain environment?
- on which environment are which releases installed?
- have any manual actions taken place on a certain environment and have they been applied to other environments or have the release scripts for the next release taken them into account (conditional execution, checking preconditions)

If a customer wants deployment of a release, he should first ask approval of the configuration manager since he knows the status of the releases and the environments. This configuration manager should be able to facilitate the release in such a way that the issues encountered are minimized. This means the configuration manager needs to have a complete overview of all environments from development to production. It helps is the configuration manager is supported by a team responsible for the release automation software since then he and his team are able to change the automation process such that it fits the needs of the business.

Less environments and less environment complexity
Everyone wants their own environment on which they can make their own decisions concerning release management, data, software deployed, when to roll-out fixes, etc. However, the more environments, the more effort is required in maintaining them, keeping them up to date and making sure the software and installation manual contain all the relevant environment dependent parameters. It becomes even more complicated (for the creation of a next release) if environments switch roles or if some parties want quick fixes to their environment before the official release. If environments are linked to other environments, the situation becomes exponentially more complicated. The solution is; keep it simple and do not grant every request for a new environment.

Database administration and application server administration are different roles
Database administration and application server administration are separate roles and should be treated as such. Let people do what they are good at and if there are no people who are good at it, hire them or train them.

Communication
Create and maintain mailinglists on who needs to be informed of what. For example, in case of a change on the production environment, the configuration manager needs to be informed so he can inform the developers so they can take the change into account for their next release. The configuration manager can also determine on which other environments the change needs to be deployed. Also he can check when the next release is created that the developers actually have taken the change into account and can delay the release until it has in order to prevent installation issues.

Give developers access to data which is similar to production data. In order to avoid developers having access to sensitive data you can use (for Oracle databases) solutions like Oracle Data Masking (see http://docs.oracle.com/cd/E11882_01/server.112/e41481/tdm_data_masking.htm#RATUG04000).
What can developers do?

Create good quality scripts which can deal with different situations. If scripts contain logic to make choices based on the situation which is encountered, the choices will not have to be made manually which reduces the risk of differences.

Environment independent
Every released script should be able to run on every environment on which the previous release was successfully installed. This allows the script to be tested on an environment before being deployed to the next environment.

Rerunnable scripts
Implementing rerunnable scripts require that scripts can make their own choices about their execution. Having rerunnable scripts provides several benefits.
  • If a script fails halfway because of an external problem, the problem can be fixed and the script can be rerun. This requires minimal manual effort. Because the manual effort is reduced, so is the chance for manual error and differences between installations done by different people.
  • Rerunnable scripts also allow easy building of an environment from scratch. Just execute all scripts in order and you're done with the database.
  • If you do not know the state of an environment, it is less of a problem if scripts are rerunnable. Just start at the release of which you are certain it is installed and work your way up to the latest release. If you are done, the environment will be up to date.
Make scripts atomic
Atomic scripts have several benefits.
  • If they fail, they can be rerun (see above). 
  • They cannot fail halfway requiring manual intervention to allow execution of the remaining part. The requirement of manual intervention often breaks release automation. Atomic scripts can be considered an enabler for automation.
  • If a certain script fails it is more easy to see what causes the error / what exactly has failed. Debugging becomes more easy.
Automate release creation and installation
Every manual action required, introduces the chance of manual error. By automating the creation and deployment of a release, the releases are more alike and the installation will be more reproducible. Also it is less likely installation of the same release done by different system operators will cause different results. There are several tools available to help with the complexity of releasing incremental database code such as Liquibase (see for example http://javaoraclesoa.blogspot.nl/2012/10/database-change-management.html).

Test the release
Often there is no separate environment to test the release on. For example, developers are responsible for installation on the test environment (manual, not using releases) and a system operator installs a release on other environments. On the test environment, because testers usually want changes as early as possible, it is usual to not wait for the next release. It would be a good idea to use the test environment to test the release on. This way, testers will have an environment more like the other environments and more stable. Also this way, the number of environments can be kept as low as possible and the quality of the release is improved before it is given to system operators.

Also it helps to have an environment very similar to production in order to test the production release on before the actual installation. You don't want it to wrong on production!

Use as few users/schema's as possible
This makes release automation easier. Also when using a single user, you do not need a network of grants and synonyms (which can break) to keep everything together. Using a single user also reduces the amount of configuration required for datasources (which create database connections from application server to database) and installation tooling. System operators will have to look up less passwords so the release will be installed faster.

What can system operators do?

Create a backup before installation
Create a backup before starting an installation. If it fails halfway due to whatever reason, it is then possible to restore the situation prior to the installation and redo the entire installation cleanly. Trying to resume an installation which has failed halfway will often cause problems.

Use the same tooling
For Java installations, often tools like Hudson, Jenkins, Bamboo or the like are employed to standardize the way a release is installed. If such a tool is used in development to achieve installations, why not use it in acceptance test / production? This will provide more consistency among environments and installations, reducing the chance of manual error. If there are good arguments against using such a tool such as security and visibility, talk to the development team about this. Maybe they can fix it for you.

Conclusions

There are a lot of reasons why (database) releases can be difficult. I've described several in this blog. One of the reasons I haven't described is cultural differences between departments. Generally speaking, developers tend towards change and innovation. They are creating new things. System operators tend towards stability and continuity. They have to keep running what has been created and will be created. Those two different worlds cross paths when a release needs to be created and installed. It is very challenging to allow for change and innovation while maintaining stability and guarantee continuity. Often this causes frustration on both sides. This becomes visible in poor quality of environments, caused by a poor release process.

A lot can be done to improve this. Most of the problems are organisational. Lack of communication due to separation of concerns/responsibilities, is an important one. Also benefits of having a good configuration manager often are underestimated. Besides the organisational issues, system operators and developers can do a lot to improve the chance a release will be installed without problems. Since a lot of the suggestions mentioned to improve the situation are straightforward, I suggest to start and make things better!