Wednesday, December 29, 2021

Apache NiFi: Reading COVID data from a REST API and producing it to a Kafka topic

Apache NiFi can be used to accelerate big data projects by allowing easy integration between various data sources. Using Apache NiFi it is easy to track what happened to your data (data provenance) and to provide features like guaranteed ordered delivery and error handling. In this example I'm going to configure NiFi to read COVID data from a REST API, split the data into individual records per country and publish the result to a Kafka topic. I've used the environment described here.


Friday, December 24, 2021

Vagrant + Docker Compose: A quick and easy Apache NiFi development environment

Vagrant can be used to quickly create development environments in for example VirtualBox, VMWare or Hyper-V. I decided to use Vagrant to create a quick Apache NiFi development environment. For Apache NiFi development, you also often require input/output for which Kafka can be used, the NiFi Registry to manage shared resources and of course NiFi itself. Setting this up yourself can be cumbersome. That's why I've created some scripts to help you do this quickly. You can find them here

Since manually scripting the installations of all these products can be a challenge / work, I decided to use Docker images which often already provide an automatic installation (so I don't have to do that myself) and used Docker Compose to easily allow the containers to find each other and have a docker-compose.yml which contained my environment variables so I wouldn't have to supply them on the commandline.