AI & Python #13: How to Automate Your Excel Reporting with Python
Use openpyxl to automate your Excel reporting with Python
Let’s face it; no matter what our job is, sooner or later, we will have to deal with repetitive tasks like updating a daily report in Excel. Things could get worse if you work for a company that doesn’t work with Python because you wouldn't be able to solve this problem by using only Python.
But don’t worry, you still can use your Python skills to automate your Excel reporting without having to convince your boss to migrate to Python! You only have to use a library to tell Excel what you want to do through Python.
Let’s start with this guide!
The Dataset
In this guide, we’ll use an Excel file with sales data that is similar to those files you have as inputs to make reports at work. You can download this file on Kaggle; however, it has a .csv format, so you should change the extension to .xlsx
or just download it from this Google Drive link (I also changed the file name to supermarket_sales.xlsx)
Before writing any code, take a look at the file on Google Drive and familiarize yourself with it. That file is going to be the input to create the following report using Python.
Let’s automate this Excel report with Python!
Make a Pivot Table with Pandas
Importing libraries
Now that you downloaded the Excel file, let’s import the libraries we’ll use in this guide.
import pandas as pd
import openpyxl
from openpyxl import load_workbook
from openpyxl.styles import Font
from openpyxl.chart import BarChart, Reference
import string
We’ll use Pandas to read the Excel file, create a pivot table, and export it to Excel. Then we’ll use the Openpyxl library to write Excel formulas, make charts, and format the spreadsheet through Python. Finally, we’ll create a Python function to automate this process.
Note: If you don’t have those libraries installed in Python, you can easily install them by writing pip install pandas
and pip install openpyxl
on your terminal or command prompt.
Reading the Excel file
Before we read the Excel file, make sure the file is in the same place where your Python script is located. Then, read the Excel file with pd.read_excel()
like in the following code.
excel_file = pd.read_excel('supermarket_sales.xlsx')
excel_file[['Gender', 'Product line', 'Total']]
The file has many columns but we’ll only use the Gender, Product line, and Total columns for the report we’re going to create. To show you what they look like, I selected them using double brackets. If we print this on Jupyter Notebooks, you’ll see the following dataframe that looks like an Excel spreadsheet.
Making a pivot table
We can easily create a pivot table from the excel_file
dataframe previously created. We just need to use the .pivot_table()
method. Let’s say we want to create a pivot table that shows the total money spent by males and females on the different product lines. To do so, we write the following code.
report_table = excel_file.pivot_table(index='Gender',
columns='Product line',
values='Total',
aggfunc='sum').round(0)
The report_table
should look something like this.
Exporting pivot table to Excel file
To export the previous pivot table created we use the .to_excel()
method. Inside parentheses, we have to write the name of the output Excel file. In this case, I’ll name this file as report_2021.xlsx
We can also specify the name of the sheet we want to create and in which cell the pivot table should be located.
report_table.to_excel('report_2021.xlsx',
sheet_name='Report',
startrow=4)
Now the Excel file is exported in the same folder where your Python script is located.
Make The Report with Openpyxl
Every time we want to access a workbook we’ll use the load_workbook
imported from openpyxl
and then save it with the .save()
method. In the following sections, I’ll be loading and saving the workbook every time we modify the workbook; however, you only need to do this once (like in the full code shown at the end of this guide)
Creating row and column reference
To automate the report, we need to take the minimum and maximum active column/row, so the code we’re going to write keeps working even if we add more data.
To obtain the references in the workbook, we first load the workbook with load_workbook()
and locate the sheet we want to work with using wb[‘name_of_sheet’]
. Then we access the active cells with .active
wb = load_workbook('report_2021.xlsx')
sheet = wb['Report']
# cell references (original spreadsheet)
min_column = wb.active.min_column
max_column = wb.active.max_column
min_row = wb.active.min_row
max_row = wb.active.max_row
You can print the variables created to get an idea of what they mean. For this example, we obtain these numbers.
Min Columns: 1
Max Columns: 7
Min Rows: 5
Max Rows: 7
Open thereport_2021.xlsx
we exported before to verify this.
As you can see in the picture above, the minimum row is 5 and the maximum row is 7. Also, the minimum row is A (1) and the maximum row is G (7). These references will be extremely useful for the following sections.
Adding Excel charts through Python
To create an Excel chart from the pivot table we created we need to use the Barchart
module we imported before. To identify the position of the data and category values, we use the Reference
module from openpyxl
(we imported Reference
in the beginning of this article)
wb = load_workbook('report_2021.xlsx')
sheet = wb['Report']
# barchart
barchart = BarChart()
#locate data and categories
data = Reference(sheet,
min_col=min_column+1,
max_col=max_column,
min_row=min_row,
max_row=max_row) #including headers
categories = Reference(sheet,
min_col=min_column,
max_col=min_column,
min_row=min_row+1,
max_row=max_row) #not including headers
# adding data and categories
barchart.add_data(data, titles_from_data=True)
barchart.set_categories(categories)
#location chart
sheet.add_chart(barchart, "B12")
barchart.title = 'Sales by Product line'
barchart.style = 5 #choose the chart style
wb.save('report_2021.xlsx')
After writing that code, the report_2021.xlsx
file should look like this.
Breaking down the code:
barchart = BarChart()
initializes abarchart
variable from theBarchart
classdata
andcategories
are variables that represent where that information is located. We’re using the column and row references we defined above to automate this. Also, keep in mind that I’m including the headers indata
but not incategories
We use
add_data
andset_categories
to add the necessary data to thebarchart
. Insideadd_data
I’m adding thetitles_from_data=True
because I included the headers fordata
We use
sheet.add_chart
to specify what we want to add to the “Report” sheet and in which cell we want to add itWe can modify the default title and chart style using
barchart.title
andbarchart.style
We save all the changes with
wb.save()
Applying Excel formulas through Python
You can write Excel formulas through Python the same way you’d write in an Excel sheet. For example, let’s say we wish to sum the data in cells B5 and B6 and show it in cell B7 with the currency style.
sheet['B7'] = '=SUM(B5:B6)'
sheet['B7'].style = 'Currency'
That’s pretty simple, right? We can repeat that from column B to G or use a for loop to automate it. But first, we need to get the alphabet to have it as a reference for the names that columns have in Excel (A, B, C, …) To do so, we use the string
library and write the following code.
import string
alphabet = list(string.ascii_uppercase)
excel_alphabet = alphabet[0:max_column]
print(excel_alphabet)
If we print this we’ll obtain a list from A to G.
This happens because first, we created an alphabet
list from A to Z, but then we took a slice [0:max_column]
to match the length of this list (7) with the first 7 letters of the alphabet (A-G).
Note: Python lists start on 0, so A=0, B=1, C=2, and so on. Also, the [a:b] slice notation takes b-a elements (starting with “a” and ending with “b-1”)
After this, we can make a loop through the columns and apply the sum formula but now with column references, so instead of writing this,
sheet['B7'] = '=SUM(B5:B6)'
sheet['B7'].style = 'Currency'
now we include references and put it inside a for loop.
wb = load_workbook('report_2021.xlsx')
sheet = wb['Report']
# sum in columns B-G
for i in excel_alphabet:
if i!='A':
sheet[f'{i}{max_row+1}'] = f'=SUM({i}{min_row+1}:{i}{max_row})'
sheet[f'{i}{max_row+1}'].style = 'Currency'
# adding total label
sheet[f'{excel_alphabet[0]}{max_row+1}'] = 'Total'
wb.save('report_2021.xlsx')
After running the code, we get the =SUM formula in the “Total” row for columns between B to G.
Breaking down the code:
for i in excel_alphabet
loops through all the active columns, but then we excluded the A column withif i!='A'
because the A column doesn’t contain numeric datasheet[f'{i}{max_row+1}'] = f'=SUM({i}{min_row+1}:{i}{max_row}'
is the same as writingsheet['B7'] = '=SUM(B5:B6)'
but now we do that for columns A to Gsheet[f'{i}{max_row+1}'].style = 'Currency'
gives the currency style to cells below the maximum row.We add the ‘Total’ label to the A column below the maximum row with
sheet[f'{excel_alphabet[0]}{max_row+1}'] = 'Total'
Formatting the report sheet
To finish the report, we can add a title, subtitle and also customize their font.
wb = load_workbook('report_2021.xlsx')
sheet = wb['Report']
sheet['A1'] = 'Sales Report'
sheet['A2'] = '2021'
sheet['A1'].font = Font('Arial', bold=True, size=20)
sheet['A2'].font = Font('Arial', bold=True, size=10)
wb.save('report_2021.xlsx')
You can add other parameters inside Font()
. On this website, you can find a list of styles available.
The final report should look like the following picture.
Automating the Report with a Python Function
Now that the report is ready, we can put all the code we’ve written so far inside a function that automates the report, so the next time we want to make this report we only have to introduce the file name and run it.
Notes: For this function to work, the file name should have the structure “sales_month.xlsx” Also, I added a few lines of code that use the name of the month/year of the sales file as a variable, so we can reuse it in the output file and subtitle of the report.
The code below might look intimidating, but it’s only what we’ve written so far plus the new variables file_name, month_name,
and month_and_extension
.
Click here to check out the complete code (Github snippet)
Applying the function to a single Excel file
Let’s imagine the original file we downloaded has the name “sales_2021.xlsx” instead of “supermarket_sales.xlsx” With this we can apply the formula to the report by writing the following
automate_excel('sales_2021.xlsx')
After running this code, you’ll see an Excel file named “report_2021.xlsx” in the same folder where your Python script is located.
Applying the function to multiple Excel files
Let’s imagine now we have only monthly Excel files “sales_january.xlsx” “sales_february.xlsx” and “sales_march.xlsx” (You can find those files on my Github)
You can either apply the formula one by one to get 3 reports
automate_excel('sales_january.xlsx')
automate_excel('sales_february.xlsx')
automate_excel('sales_march.xlsx')
or you could concatenate them first using pd.concat()
and then apply the function only once.
# read excel files
excel_file_1 = pd.read_excel('sales_january.xlsx')
excel_file_2 = pd.read_excel('sales_february.xlsx')
excel_file_3 = pd.read_excel('sales_march.xlsx')
# concatenate files
new_file = pd.concat([excel_file_1,
excel_file_2,
excel_file_3], ignore_index=True)
# export file
new_file.to_excel('sales_2021.xlsx')
# apply function
automate_excel('sales_2021.xlsx')
That’s it! If you have a question, let me know in the comment section.