Suppose a synchronous call is made and the system takes a while to process the information. In the mean time the end-user might be waiting for the request to be completed while the end-user might not (immediately) be interested in the response. Why not make the process asynchronous?
Making a process asynchronous has some drawbacks. The result of the processing of the request will not be available immediately in the front- and back-end so you cannot use this information yet and often you do not know when (and if) the information will become available. If something goes wrong during processing, who will be informed to take measures? (How) does the back-end inform the front-end when it's done? You can think of server push mechanisms.
This is of course a famous pattern. The claim-check pattern is often used when large objects are used such as large binary files, which you do not want to pull through your entire middleware layer. Often the data is labelled and saved somewhere. The middleware can get a reference to the data. This reference can be send to the place it needs to be and the data can be fetched and processed there.
Service calls are expensive since they often traverse several layers of hard- and software. For example I need to fetch data on a lot of persons and I have a service to fetch me person information. I can call this service for every individual person. This can mean a Service Bus instance, a SOA composite instance, a SOA component instance,a database adapter instance, a database connection and fetching of a single item all the way back (not even talking about hard- and software load-balancers). Every instance and connection (e.g. HTTP, database) takes some time. If you can minimize the instances and connections, you obviously can gain a lot of performance. How to do this is more easy than it might seem. Just fetch more than one person in a single request.
If fetching certain pieces of information takes a lot of time, it might be worthwhile not fetch it every time you need it from the source but to use a cache. Of course you need to think about (among other things) how up to date the data in the cache needs to be, how/when you are going to update it and cache consistency. Also you only want to put stuff in the cache of which you know it is very likely is is going to be retrieved again in the near future. This might require predictive analyses. You can preload a cache or add data the moment it is fetched for the first time.
Caching can be done at different layers, in the front-end, at the service layer or at the database layer. You can even cache service requests in a proxy server.
Often data from different sources needs to be integrated. This can be done efficiently in a database. In such a case you can consider implementing an Operational Data Store. This is also a nice place to do some caching.
If you are serially processing data and you have cores/threads/connections to spare, you might consider running certain processing steps in parallel. There is often an optimum in the performance when increasing the number of parallel threads (at first processing time decreases, after the optimum has been reached, processing time increases). You should do some measures on this in order to determine this optimum.
If a system has to interface with another system which does not support concurrency, you can use throttling mechanisms (pick up a message from the queue every few seconds) to spread load and avoid parallel requests. Do not forget the effects of having a clustered environment here.
If a service is sometimes unavailable, a retry might solve the issue. If however the system is unstable due to high load, a retry might increase the load further and make the system more unstable, causing more errors. This can cause a snowball effect since all the errors caused might go into retries further increasing the load on the system. If you throttle a queue near the back-end system causing the issue and put the normal requests and the retry requests on the same queue, you can avoid this. This probably requires custom fault-handling though.
Quality of service
You might want to avoid your background processing (e.g. a batch) to interfere with your front-end requests. There are several ways to help achieve this. First you can use queues with priority messages. The batch requests can have a lower priority than front-end requests. You can also split the hardware. Have separate servers to do the batch-processing and other servers doing the front-end serving. Whether this is a viable option depends on the effort required to create new servers to do the batch on.
Service granularity and layering
Usually services in a SOA landscape are layered (e.g. data services, integration services, business services, presentation services, utility services, ...). A service call usually has overhead (e.g. different hardware layers are crossed, cost of instance creation). This layering increases the amount of overhead which is suffered because it increases the amount of service calls which are made. If however you have only a few services which provide a lot of functionality, you suffer in terms of re-use and flexibility. You should take the number of required service calls for a specific piece of functionality into account when thinking about your service granularity and layering. When performance is important, it might help to create a single service to provide the functionality of a couple of services in order to reduce the service instance creation and communication overhead. Also think about where you join your data. A database is quite efficient for those kind of things. You might consider the previously mentioned Operational Data Store and have your services fetch data from there.
If you want to keep performance in mind when creating services, there are several patterns you can use which can help you. Most are based on;
- Reduce superfluous data fetching. Think about efficient caching (and cache maintenance). Think about implementing a claim-check pattern.
- Reduce the number of service calls. Think about service layering, granularity and fetching of data sets instead of single pieces of data.
- When will the data be needed? Think about asynchronous processing (and how you know when it's done).
- Optimally use your resources. Think about parallel processing and efficient load-balancing.
- Keep the user experience in mind. Think about priority of message-processing (batches often have lower priority).
Of course this article does not give you a complete list of patterns. They are however the patterns I suggest you at least consider when performance becomes an issue. This of course next to technical optimizations on infrastructure, configuration and service level.