Skip to main content

NiFi

Module nifi

Certified

This plugin extracts the following:

  • NiFi flow as DataFlow entity
  • Ingress, egress processors, remote input and output ports as DataJob entity
  • Input and output ports receiving remote connections as Dataset entity
  • Lineage information between external datasets and ingress/egress processors by analyzing provenance events

Current limitations:

  • Limited ingress/egress processors are supported
    • S3: ListS3, FetchS3Object, PutS3Object
    • SFTP: ListSFTP, FetchSFTP, GetSFTP, PutSFTP

CLI based Ingestion

Install the Plugin

pip install 'acryl-datahub[nifi]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: "nifi"
config:
# Coordinates
site_url: "https://localhost:8443/nifi/"

# Credentials
auth: SINGLE_USER
username: admin
password: password

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

View All Configuration Options
FieldRequiredTypeDescriptionDefault
envstringThe environment that all assets produced by this connector belong toPROD
site_urlstringURI to connectNone
authenum(NifiAuthType)Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERTNO_AUTH
provenance_daysintegertime window to analyze provenance events for external datasets7
site_namestringSite name to identify this site with, useful when using input and output ports receiving remote connectionsdefault
site_url_to_site_nameDict[str,string]Lookup to find site_name for site_url, required if using remote process groups in nifi flow{}
usernamestringNifi username, must be set for auth = "SINGLE_USER"None
passwordstringNifi password, must be set for auth = "SINGLE_USER"None
client_cert_filestringPath to PEM file containing the public certificates for the user/client identity, must be set for auth = "CLIENT_CERT"None
client_key_filestringPath to PEM file containing the client’s secret keyNone
client_key_passwordstringThe password to decrypt the client_key_fileNone
ca_filestringPath to PEM file containing certs for the root CA(s) for the NiFiNone
process_group_patternAllowDenyPattern (see below for fields)regex patterns for filtering process groups{'allow': ['.*'], 'deny': [], 'ignoreCase': True}
process_group_pattern.allowArray of stringList of regex patterns to include in ingestion['.*']
process_group_pattern.denyArray of stringList of regex patterns to exclude from ingestion.[]
process_group_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True

Code Coordinates

  • Class Name: datahub.ingestion.source.nifi.NifiSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for NiFi, feel free to ping us on our Slack