Wednesday, May 18, 2022

Apache NiFi: Importing and exporting parameters

When you import a new process group or upgrade an existing one, missing parameters contexts and parameters will automatically be added. The new parameters will be filled with values from the environment where the process group was committed to the Registry (except sensitive parameter values). This is usually a development environment. NiFi 1.15 adds parameter context inheritance. If however you are on a lower version and have many similar process groups, you can have many copies of parameter contexts. If you add a large number of flows and need to add a large number of parameter contexts, it can be bothersome to have to manually update all the parameter contexts used by the new process groups. 

In most cases when you use deployment tooling the environment configuration is kept separately from the application and applied upon deployment. This is also one of the principles of the 12 factor application (read here). For example in Azure DevOps you can use variable groups, in XL Deploy environments, a Spring Boot applications usually uses property files and K8s resources can use Kustomize templates. In this blog post I created a script which allows you to use a similar method. You can export parameters from an environment (except for sensitive parameter values). This results in a CSV file. This CSV file can be imported into a different environment. This also allows you to keep a CSV parameter file per environment which can be applied on deployment. Updating parameters in a CSV is a lot easier than doing the same manually and you can easily check if everything is correct.

You can check out my code here. Specifically look at the export_parameters and import_parameters methods.

Exporting parameters to a CSV

Parameter names cannot contain for example newlines or quotes. Parameter values and the description of the parameter however can contain those characters. When saving parameter values or descriptions, it could break the CSV structure. I fixed this by Base64 encoding the values and descriptions. Since I don't want to Base64 encode everything (updating Base64 encoded values/descriptions is a bit cumbersome), I only encode the values/descriptions when needed and save to the CSV if they should be decoded or not. This way I can still easily update almost everything in the CSV file as text and only have to do the Base64 encode/decode step in a couple of cases.

Sensitive values

Sensitive values cannot be exported by the API for as far as I know. This is not a problem though, because usually the sensitive values differ per environment. In the export I set the values to an empty string.

Importing parameters from a CSV

Clean and rebuild

I did not want to clean all parameters and parameter contexts and rebuild it from scratch. This has several reasons.

  • It would be challenging in a running environment causing many processors to become (unnecessarily) invalid
  • First removing and then recreating every parameter would have caused many API calls. Deployment can be much quicker when we only update when necessary.
I did add the option to overwrite existing values but only when they are actually changed in order to avoid unnecessary API calls. You can also choose to leave existing parameters alone.

Sensitive values

A special case are sensitive values. The sensitive values are not exported. We cannot (easily) check whether a sensitive value in the CSV is different to the value present in the NiFi environment. Overwriting them could cause issues if we have not specified every sensitive value in the file we would like to import. In order to avoid issues I decided to only apply sensitive values if the sensitive value is explicitly specified in the import CSV. It would be a shame if we accidently overwrite a password with an invalid one. This can cause flows to break. 

Updating existing values

If a parameter already there, I update it in place if it has changed and otherwise I add it. Why update it in place? A parameter might be referenced by components like processors and controller services or contain properties not present in the export such as referencing components or permissions. For the parameter context this is similar. The parameter context might be bound to a process group for example. You do not want to lose this information or explicitly have to include it in an export/import . Similar to inheritance properties. I tested this prior to 1.15 in an environment which does not have fine grained access controls so I cannot be 100% sure this will work as expected. Should you try this, I'd love to hear your experiences!

Check the changes which would have been made

For importing parameters, it is always a good idea to first check what the script would have done in order to confirm it is what you expect it to be. I've introduced a dummy run option which provides elaborate log output about what would have been done without actually changing anything.

A challenge here was that the parameter context might or might not exist. If the parameter context does not exist, we still want to check that the parameters are created in that context. Also we want to check that for the first parameter the context is created and for the second parameter the context is fetched. For this purpose, a local dummy context is created which is not send to the API and the fact that this context is created is saved so that for a next parameter we can check if a context has previously been created or not.

Example

I used the following CSV called export.csv

"dummy","dummyparamname","dummyvalue","Text","False","dummydescription","Text"

The columns in order:

  • Parameter context name
  • Parameter name
  • Parameter value
  • Parameter value type (Text, Base64, None)
  • Parameter sensitivity (True or False)
  • Parameter description
  • Parameter description type (Text, Base64)

Updating a parameter and context

I call the import_parameters method with the following parameters:

import_parameters('export.csv', overwrite_existing_params=True, dummyrun=False)

When I first run it, the context does not exist. It creates the context and adds the parameter:

INFO:__main__:Logged in: True
INFO:__main__:Context not found. Creating
INFO:__main__:Created context dummy

INFO:__main__:Parameter entity not supplied to create_param_entity. Creating a new one
INFO:__main__:Updating sensitive property of dummyparamname to False
INFO:__main__:Updating value of dummyparamname
INFO:__main__:Updating description of dummyparamname
INFO:__main__:Parameter dummyparamname has been updated!
INFO:__main__:Adding parameter: dummyparamname to dummy
INFO:__main__:Parameter changed so calling the API to update it

When I execute the same again the context is fetched:

INFO:__main__:Logged in: True
INFO:__main__:Context found. Fetching
INFO:__main__:Parameter: dummyparamname will be updated in context dummy since it is not sensitive
INFO:__main__:Not updating sensitive property of dummyparamname since its value is unchanged
INFO:__main__:Not updating value of dummyparamname since its value is unchanged
INFO:__main__:Not updating description of dummyparamname since its value is unchanged
INFO:__main__:Parameter dummyparamname has not been updated!
INFO:__main__:Parameter not changed so not calling the API to update it

When I remove the parameter from the context using the web interface and execute the script again:

INFO:__main__:Logged in: True
INFO:__main__:Context found. Fetching
INFO:__main__:Parameter entity not supplied to create_param_entity. Creating a new one
INFO:__main__:Updating sensitive property of dummyparamname to False
INFO:__main__:Updating value of dummyparamname
INFO:__main__:Updating description of dummyparamname
INFO:__main__:Parameter dummyparamname has been updated!
INFO:__main__:Adding parameter: dummyparamname to dummy
INFO:__main__:Parameter changed so calling the API to update it

When I update the parameter in the CSV

INFO:__main__:Logged in: True
INFO:__main__:Context found. Fetching
INFO:__main__:Parameter: dummyparamname will be updated in context dummy since it is not sensitive
INFO:__main__:Not updating sensitive property of dummyparamname since its value is unchanged
INFO:__main__:Updating value of dummyparamname
INFO:__main__:Not updating description of dummyparamname since its value is unchanged
INFO:__main__:Parameter dummyparamname has been updated!
INFO:__main__:Parameter changed so calling the API to update it

Of course we can see the parameter context and parameter in NiFi:


Test run functionality

When I do the same with the following parameters:

import_parameters('export.csv', overwrite_existing_params=True, dummyrun=True)

I can perform the same run multiple times since it makes no changes.

When I first run it, the context does not exist.

INFO:__main__:Logged in: True
INFO:__main__:Context not found. Creating
INFO:__main__:Created dummy context dummy
INFO:__main__:Parameter entity not supplied to create_param_entity. Creating a new one
INFO:__main__:Updating sensitive property of dummyparamname to False
INFO:__main__:Updating value of dummyparamname
INFO:__main__:Updating description of dummyparamname
INFO:__main__:Parameter dummyparamname has been updated!
INFO:__main__:Adding parameter: dummyparamname to dummy
INFO:__main__:Parameter changed so calling the API to update it
INFO:__main__:Dummy run. Not updating parameter context

When the context exists but it has no parameter:

INFO:__main__:Logged in: True
INFO:__main__:Context found. Fetching
INFO:__main__:Parameter entity not supplied to create_param_entity. Creating a new one
INFO:__main__:Updating sensitive property of dummyparamname to False
INFO:__main__:Updating value of dummyparamname
INFO:__main__:Updating description of dummyparamname
INFO:__main__:Parameter dummyparamname has been updated!
INFO:__main__:Adding parameter: dummyparamname to dummy
INFO:__main__:Parameter changed so calling the API to update it
INFO:__main__:Dummy run. Not updating parameter context

When the parameter is there but with a different value;

INFO:__main__:Logged in: True
INFO:__main__:Context found. Fetching
INFO:__main__:Parameter: dummyparamname will be updated in context dummy since it is not sensitive
INFO:__main__:Not updating sensitive property of dummyparamname since its value is unchanged
INFO:__main__:Updating value of dummyparamname
INFO:__main__:Updating description of dummyparamname
INFO:__main__:Parameter dummyparamname has been updated!
INFO:__main__:Parameter changed so calling the API to update it
INFO:__main__:Dummy run. Not updating parameter context

And when the parameter is there and has the same value;

INFO:__main__:Logged in: True
INFO:__main__:Context found. Fetching
INFO:__main__:Parameter: dummyparamname will be updated in context dummy since it is not sensitive
INFO:__main__:Not updating sensitive property of dummyparamname since its value is unchanged
INFO:__main__:Not updating value of dummyparamname since its value is unchanged
INFO:__main__:Not updating description of dummyparamname since its value is unchanged
INFO:__main__:Parameter dummyparamname has not been updated!
INFO:__main__:Parameter not changed so not calling the API to update it

Finally

This script has not been tested in environments with strict permissions. Parameter inheritance has not been tested. Also the script has not been tested with process group variables. The script does not allow assigning a parameter context to a process group (although this can easily be added). When you want to use this script for yourself, please test it carefully!

No comments:

Post a Comment