Monday, January 24, 2022

Generating random JSON data from an AVRO schema in Java

Recently I was designing an AVRO schema and wanted to test how data would look like which conformed to this schema. I developed some Java code to generate sample data. This of course also has uses in more elaborate tests which require generation of random events. Because AVRO is not that specific, this is mainly useful to get an idea of the structure of a JSON which conforms to the definition. Here I'll describes a simple Java (17 but will also work on 11) based solution to do this.

Dependency

You only need a single dependency outside the regular JDK. Below this dependency as a Maven pom.xml snippet;

<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.11.0</version>
</dependency>

Schema

In the resources folder of my project I've put a file file.avsc witch contained my AVRO schema;

{
"type" : "record",
"namespace" : "nl.amis.smeetsm.schema",
"name" : "Employee",
"fields" : [
{ "name" : "name" , "type" : "string" },
{ "name" : "age" , "type" : "int" }
]
}

To develop the schema I've used the Apache Avro IDL Schema Support plugin in IntelliJ IDEA. This makes it especially easy to find errors in the schema during development.

Java

My (minimal) Java class to read the schema and generate random JSON which conforms to the schema;

import org.apache.avro.Schema;
import org.apache.avro.util.RandomData;

import java.io.*;
import java.util.Iterator;

public class AvroTest {
public static void main(String [] args) throws IOException {
AvroTest me = new AvroTest();
ClassLoader classLoader = me.getClass().getClassLoader();
InputStream is = classLoader.getResourceAsStream("file.avsc");
Schema schema = new Schema.Parser().parse(is);
Iterator<Object> it = new RandomData(schema, 1).iterator();
System.out.println(it.next());
}
}

The code is self-explanatory. It is easy to generate more random data this way for use in tests.

Output

When I run my Java class, it will generate output like;

{"name": "cenmfi", "age": -746903563}

Finally

AVRO schema are limited in how strict they can be. They are not like JSON Schema. It is for example not easy (or even possible?) using AVRO to limit an int type to a certain min and max value or to limit a text field to a regular expression. AVRO schema are mostly used to help encode JSON messages going over Kafka streams (mostly from Java) and to allow some minimal validation. Because AVRO is not that specific, it is relatively easy to generate random data which conforms to the schema. It is however not easy to only generate messages which make sense (notice the "age" field in my example). If this is a requirement, you might be better off using JSON Schema or Protobuf for JSON serialization/deserialization on Kafka since it allows for more specific validation and code generation. The Confluent platform supports all three options (here) and there are serializers/deserializers available for at least Java, .NET and Python (see here).

No comments:

Post a Comment