No Code Data Ingestion Framework using NiFi
Data Ingestion:
Data ingestion is the transportation of data from assorted sources to a storage medium where it can be accessed, used, and analyzed by an organization. The destination is typically a data warehouse, data mart, database, or a document store.Sources can be from RDBMS like MySql, Oracle, Postgres, File based like FTP,SFTP,Rest api’s,Streaming .The data ingestion layer is the backbone of any data engineering architecture.
APACHE NIFI
What is Apache Nifi?
Apache NiFi is one of the key tools in the area of Data Engineering and Big Data. It is primarily used for Data Ingestion and Orchestration. It is a real-time data ingestion platform, which can transfer and manage data transfer between different sources and destination systems. This supports a wide variety of data sources and protocols making this platform popular in many IT organizations.
What are NiFi Rest Api’s?
One of the key features of NiFi is its REST API, which allows developers to interact with the platform programmatically. The API provides a range of endpoints that can be used to manage NiFi’s configuration, data flows, and monitoring. One of the key benefits of the REST API is its ability to automate common tasks, such as creating data flows, starting and stopping processors, and retrieving data flow metrics. Developers can use the API to create custom scripts and applications that interact with NiFi, enabling them to automate their data integration processes.
How are we Using Nifi?
We have automated the ingestion process using the NiFi framework by calling the nifi restapi’s. We have written a wrapper on NiPyApi to call the NiFi rest API. Through this, we are able to create data flows, start and stop the processors, retrieve the data from the processors and run end-to-end ingestion while monitoring every step of the process. The information is passed to the api’s using json files.
Processor Group Flow
Basic Features of NIMBUS NiFi:
- Nimbus-NiFi enables users to ingest data from multiple sources into different destinations.
- No need to write any script or code.
- The user doesn’t need to worry about the configurations.
- Users must provide the source and destination details in easily configurable json files, and Nimbus-NiFi will handle the rest.
- Support for both SSL and NON SSL.
Support for Multiple Sources And Destination
- Supported Sources:
- RDBMS
- Mysql
- Oracle
- PostgreSQL
- SFTP/FTP
- RDBMS
- Supported Destinations:
- S3
- HDFS
How to use Nimbus-NiFi?
- Create the config.json file.
- Create the setup.json file according to your source and destination.
- Make a ingestion.json file according to your source and destination.
- Run these command on terminal
- python setup.py install
- nimbus_env –f {path to your setup.json file} –c {path to your config.json file} to set up the NiFi environment.
- run_ingestion –f {path to your ingestion.json file} –c {path to your config.json file} to run the ingestion.
You can also refer to our open-source project: https://github.com/tothenew/nimbus-nifi