Sunday, February 27, 2022

Vagrant and Hyper-V: Don't do it!

I've used Vagrant since 2015 in combination with Virtualbox for creating development machines. Recently however I'm experiencing more issues with Virtualbox. For example CPUs getting stuck when assigning multiple CPUs to a VM and issues with auto adjusting the guest resolution when I resize the VM window. These annoyances drove me to try out Vagrant with Hyper-V (running an Ubuntu 21.04 guest on a Windows 11 host). In this blog post I'll describe my experiences. In summary, it did not make me happy. A lot of things which work out of the box with Vagrant and VirtualBox require effort to get working in Hyper-V. Not only that but several alternative solutions are required outside of Hyper-V because of lack of features. I think I should try VMWare next to see if it will provide a better experience. You can download my Vagrantfile and provisioning script here.

Sunday, February 6, 2022

Merge AVRO schema and generate random data or Java classes

Previously I wrote about generating random data which conforms to an AVRO schema (here). In a recent use-case, I encountered the situation where there were several separate schema files containing different AVRO types. The message used types from those different files. For the generation of random data, I first needed to merge the different files into a single schema. In addition, I wanted to generate Java classes for the complete message which required importing dependent types in the pom.xml. In this blog post I'll describe how I did that.

Thursday, January 27, 2022

Java: Validating JSON against an AVRO schema

AVRO schema are often used to serialize JSON data into a compact binary format in order to for example transport it efficiently over Kafka. When you want to validate your JSON against an AVRO schema in Java, you will encounter a challenge. The JSON which is required to allow validation against an AVRO schema from the Apache AVRO libraries is not standard JSON. It requires explicit typing of fields. Also when the validation fails, you will get errors like: "Expected start-union. Got VALUE_STRING" or "Expected start-union. Got VALUE_NUMBER_INT" without a specific object, line number or indication  of what is expected. Especially during development, this is insufficient.

In this blog post I'll describe a method (inspired by this) on how you can check your JSON against an AVRO schema and get usable validation results. First you generate Java classes of your AVRO schema using the Apache AVRO Maven plugin (which is configured differently than documented). Next you serialize a JSON object against these classes using libraries from the Jackson project. During serialization, you will get clear exceptions. See my sample code here.

Monday, January 24, 2022

Generating random JSON data from an AVRO schema in Java

Recently I was designing an AVRO schema and wanted to test how data would look like which conformed to this schema. I developed some Java code to generate sample data. This of course also has uses in more elaborate tests which require generation of random events. Because AVRO is not that specific, this is mainly useful to get an idea of the structure of a JSON which conforms to the definition. Here I'll describes a simple Java (17 but will also work on 11) based solution to do this.

Monday, January 10, 2022

Apache NiFi: Forwarding HTTP headers

Apache NiFi can be used to expose various flavors of webservices. Using NiFi in such a way provides benefits like quick development using a GUI and of course data provenance. You know who called you with which data and where the data went. The NiFi is very scalable, delivery can be guaranteed and NiFi can help with features like back-pressure if a backend system cannot handle requests as quickly as they are offered. Exposing webservices by using NiFi, can have additional benefits such as service virtualization (decoupling). When exposing HTTP(S) webservices, a regular requirement is to pass through HTTP headers. This blog post is about how you can do that using the NiFi processors ListenHTTP, InvokeHTTP, HandleHttpRequest and HandleHttpResponse. I've used the environment which is described here.

Wednesday, December 29, 2021

Apache NiFi: Reading COVID data from a REST API and producing it to a Kafka topic

Apache NiFi can be used to accelerate big data projects by allowing easy integration between various data sources. Using Apache NiFi it is easy to track what happened to your data (data provenance) and to provide features like guaranteed ordered delivery and error handling. In this example I'm going to configure NiFi to read COVID data from a REST API, split the data into individual records per country and publish the result to a Kafka topic. I've used the environment described here.


Friday, December 24, 2021

Vagrant + Docker Compose: A quick and easy Apache NiFi development environment

Vagrant can be used to quickly create development environments in for example VirtualBox, VMWare or Hyper-V. I decided to use Vagrant to create a quick Apache NiFi development environment. For Apache NiFi development, you also often require input/output for which Kafka can be used, the NiFi Registry to manage shared resources and of course NiFi itself. Setting this up yourself can be cumbersome. That's why I've created some scripts to help you do this quickly. You can find them here

Since manually scripting the installations of all these products can be a challenge / work, I decided to use Docker images which often already provide an automatic installation (so I don't have to do that myself) and used Docker Compose to easily allow the containers to find each other and have a docker-compose.yml which contained my environment variables so I wouldn't have to supply them on the commandline.