How to set up a snapshot report with a combination of Nifi and Python

nguyen hanh
4 min readJun 13, 2020

Getting started with automation report using Nifi and python

In an organization, when stakeholders want to see a quick snapshot about what going on or what revenue of this month compared with last month, what are you gonna do ?. There are several ways to make it possible such as automation reports by Power BI or Tableau or simply make a spreadsheet connect to data sources and set a schedule for it. But it's not easy to find their metrics that they care about in complicated dashboards unless you designed separate sheets for that, also stakeholders need to check their email to update information and it can be messed with a dozen other emails.

Too many dashboards. Its time to change

In this article, I’ll show how to set up an automation report using Nifi and post it via Slack or whatever messaging app your organization is using.

What is Nifi: Nifi is an open-source software by Apache that managed dataflow as well as provided an ETL solution with ease. You can read more about Nifi here.

Let begin with some materials that I use in this article. After that, we will walk through it step by step.

  1. Data preparation: SQL, simply using flat-file (CSV, Excel): Data preparation
  2. Message building: use Python to build your snapshot report, check data quality, integrity.
  3. Nifi: Schedule, execute python scripts, send slack notification

Step 1: You need to get data using SQL. In this case, I use GG Bigquery. However, before that, some libraries need to be installed, it can be done via this post here.

def get_data_table():qr = """select distinct column A,column B, column C
from `table D`
where condition 1, condition 2, condition 3
order by 1
"""df = pd.read_gbq(qr, project_id='vinid-datalake-prod',dialect='standard')return df
df = get_data_table()

So we will get a data frame with data needed to calculate our metrics

Step 2: Manipulating your data using python, you can define some functions to calculate your metrics, show the increase or decrease of data.

#Comparision iconsdef metric_comparision(m1, m2):delta = m1/m2-1if delta >= 0:return f":increase: {delta:,.2%}"else:return f":decrease: {delta:,.2%}"

The output looks like this

Step 3: Using Nifi to execute python scripts, set schedule and send an alert

The general flow of Nifi to create an automation report

The set up:

Schedule: you can find the meaning of each number here

Put file: put the file to the folder that your Nifi is running on

Executed the python file (this step will create a snapshot report as above)

Send an alert if something goes wrong:
Input value:

  • Access token: token to the message service: either slack, telegram
  • Channel that you want to send message to
Bad python script errors
Bad SQL errors
error messages look like this

Above I’ve shown you the very basics of the combination of using Nifi and python to solve one of the fundamental problems of any organization — Realtime report. This process helps every stakeholder can have a realtime update about the business condition right on their finger and their smartphones. This solution also helps Data Engineer can track their data integrity, latency, or Data Analyst with quick insight and having more time on deeper analysis rather than doing an ad-hoc report.

Thanks a lot for reading

--

--