Thursday, January 27, 2022

Java: Validating JSON against an AVRO schema

AVRO schema are often used to serialize JSON data into a compact binary format in order to for example transport it efficiently over Kafka. When you want to validate your JSON against an AVRO schema in Java, you will encounter a challenge. The JSON which is required to allow validation against an AVRO schema from the Apache AVRO libraries is not standard JSON. It requires explicit typing of fields. Also when the validation fails, you will get errors like: "Expected start-union. Got VALUE_STRING" or "Expected start-union. Got VALUE_NUMBER_INT" without a specific object, line number or indication  of what is expected. Especially during development, this is insufficient.

In this blog post I'll describe a method (inspired by this) on how you can check your JSON against an AVRO schema and get usable validation results. First you generate Java classes of your AVRO schema using the Apache AVRO Maven plugin (which is configured differently than documented). Next you serialize a JSON object against these classes using libraries from the Jackson project. During serialization, you will get clear exceptions. See my sample code here.

Monday, January 24, 2022

Generating random JSON data from an AVRO schema in Java

Recently I was designing an AVRO schema and wanted to test how data would look like which conformed to this schema. I developed some Java code to generate sample data. This of course also has uses in more elaborate tests which require generation of random events. Because AVRO is not that specific, this is mainly useful to get an idea of the structure of a JSON which conforms to the definition. Here I'll describes a simple Java (17 but will also work on 11) based solution to do this.

Monday, January 10, 2022

Apache NiFi: Forwarding HTTP headers

Apache NiFi can be used to expose various flavors of webservices. Using NiFi in such a way provides benefits like quick development using a GUI and of course data provenance. You know who called you with which data and where the data went. The NiFi is very scalable, delivery can be guaranteed and NiFi can help with features like back-pressure if a backend system cannot handle requests as quickly as they are offered. Exposing webservices by using NiFi, can have additional benefits such as service virtualization (decoupling). When exposing HTTP(S) webservices, a regular requirement is to pass through HTTP headers. This blog post is about how you can do that using the NiFi processors ListenHTTP, InvokeHTTP, HandleHttpRequest and HandleHttpResponse. I've used the environment which is described here.

Wednesday, December 29, 2021

Apache NiFi: Reading COVID data from a REST API and producing it to a Kafka topic

Apache NiFi can be used to accelerate big data projects by allowing easy integration between various data sources. Using Apache NiFi it is easy to track what happened to your data (data provenance) and to provide features like guaranteed ordered delivery and error handling. In this example I'm going to configure NiFi to read COVID data from a REST API, split the data into individual records per country and publish the result to a Kafka topic. I've used the environment described here.


Friday, December 24, 2021

Vagrant + Docker Compose: A quick and easy Apache NiFi development environment

Vagrant can be used to quickly create development environments in for example VirtualBox, VMWare or Hyper-V. I decided to use Vagrant to create a quick Apache NiFi development environment. For Apache NiFi development, you also often require input/output for which Kafka can be used, the NiFi Registry to manage shared resources and of course NiFi itself. Setting this up yourself can be cumbersome. That's why I've created some scripts to help you do this quickly. You can find them here

Since manually scripting the installations of all these products can be a challenge / work, I decided to use Docker images which often already provide an automatic installation (so I don't have to do that myself) and used Docker Compose to easily allow the containers to find each other and have a docker-compose.yml which contained my environment variables so I wouldn't have to supply them on the commandline.

Tuesday, October 5, 2021

Product selection done quickly

Your customer has a specific challenge and wants to have it solved by means of software. They ask you as a software/solution architect to advise them on this topic and they want to be able to choose a solution based on your advise by the end of next week (we're not talking about something as large as a government tender). How can you approach this challenge? When using a formalized approach, although usually thorough, you will probably not make it by next week and thus you are forced to make some shortcuts.

It boils down to establishing a set of prioritized requirements and evaluating them against potential solutions. In this blog post I'll provide a general outline of a 'quick and dirty' (not so formal) product selection process which can be done relatively quickly. I'll start with identification and classification of stakeholders. Next I'll suggest several topics to talk to the stakeholders about in order to determine and prioritize requirements. Ending with several suggestions on how to obtain possible candidates for the solution and compare them.


Disclaimer

There is probably overlap with existing approaches to accomplish the same. Please inform me so I can cross check this, learn and add references. I've used CMMI-DEV as an inspiration among others. The below approach is not a company standard. It is an approach I've personally tried and have good experiences with.

Saturday, July 17, 2021

Measure the Quality of your Source Code!

Quality is something which is hard to define. Different areas of expertise use their own definitions of what quality is. Without an objective standard which carries weight, anyone can claim to provide a quality product or service according to some standard. This makes it difficult to compare products and to formalize which characteristics a product or service needs to have. In this blog post I'll provide an introduction to ISO/IEC-5055 which is a quality standard which allows us to measure the quality of source code objectively.