Let's say your boss gives you the task to compare the names of employees in different regions on separate Excel sheets and tells you to state on the first Excel sheet on a separate column whether or not the employee is working at another branch by the end of the day. Easy right? It is definitely, but he actually gave you 750 names to compare and return by the end of the week, doesn't sound all that easy anymore right? And of course, we're humans and are prone to mistakes. A mistake might happen if we perform it manually and do not automate it.
In this article, you'll learn how to solve this Excel sheet task of 1 week, in a few minutes with Python's openpyxl
library (a beautiful lifesaver π).
π Prerequisites
Python and PIP installed
A basic understanding of Python and programming
If you don't have Python and PIP installed use this link and this link respectively for them.
π Setup
Step 1: Run cmd
as an administrator, and enter
pip install openpyxl
Once you're done, the command prompt should look like the one above.
Step 2: Open a python editor of your choice and create a new python file in any directory and type import openpyxl
to import the package. I'll be using VSCode for this and naming my file automateExcel.py
.
Now that you have everything set up, time to delve into the meat of the matter.
βπΎ Using Openpyxl
Now, to complete the given task above, we'll be using two similar examples and we'll edit those documents, then you can use that same concept to solve the task above and any other occurrence. Here, we have the two excel files (located in our current directory) that we'll be comparing with more than 100 names in each of them and with 'company-in-nigeria' and 'company-in-rwanda' as their respective names.
We want to compare the two sheets, stating on the 'Present in Foreign Department' column whether or not the employee name exists elsewhere.
To do that, let's assign the two file paths to two separate variables.
Copy and paste this code to your editor.
firstFile = r"company-in-nigeria.xlsx"
secondFile = r"company-in-rwanda.xlsx"
The r in front of the windows file path denotes a raw un-escaped string, without it, the string would not be read and would produce an error. Check here for more info.
Next, use the load_workbook()
method on openpyxl to open our excel sheets, storing the return value for both files in two other variables i.e
wbObj1 = openpyxl.load_workbook(firstFile)
wbObj2 = openpyxl.load_workbook(secondFile)
Then, we read the available sheets using the worksheet
list from the objects, we opened above.
wbk1 = wbObj1.worksheets[0]
wbk2 = wbObj2.worksheets[0]
Since there is only one sheet available, we obtain it via indexing as shown above, and since it's the first sheet, our index is 0. We would need this to be able to edit later on.
Now, let's create a names
list to contain the names in our second file (company-in-rwanda) which we'll use to compare with the names in the first file. The code below appends all the names in the first column of the second file, starting from the second row (skipping the header) till the end.
names = []
for myRow in range(2, wbk2.max_row + 1):
names.append(wbk2.cell(row=myRow, column=1).value.lower())
wbObj2.close
Use the max_row
property on the current or selected worksheet to get the number of rows present, I'm adding + 1
because obtaining cell values is not index-based. To obtain a particular cell value from the sheet, we call the cell
method from the worksheet, and in our case is the second excel sheet (wbk2) which accepts two parameters, row
, which is the row number of the current sheet and in this case, starts counting from the second row (skipping the title), and column
which is the column number of the current sheet, in this case, is the first column (Names). And we obtain the value of the row and column with the value()
method. We make all names lowercase with the lower()
method before we append it to perform a non-case sensitive search. And finally, we close the file with the workbook object (wbObj2) set since we won't be needing it anymore.
Next, we would need to compare names in both files and write the results of our comparison on the first file - company-in-nigeria.
for rowNum in range(2, wbk1.max_row + 1):
if wbk1.cell(row=rowNum, column=1).value.lower() in names:
wbk1.cell(row=rowNum, column=2).value = "Present"
else:
wbk1.cell(row=rowNum, column=2).value = "Not present"
wbObj1.save(firstFile)
wbObj1.close
Note: Close all opened excel files you'll be using or referencing as you won't be able to perform operations on them while it's opened.
Once we've made all our non-case sensitive comparisons and stated whether the names were present or not, we need to save our edited file with the save()
method from the workbook object, passing in the file we edited, which in this case is firstFile
as shown above (Don't forget to run your code!). Here is the result.
Check out the full code here. Use this same method for your 750 names in different excel files from earlier and you're good to go!
πCongratulations
Congratulations! ππ You just finished a task of ~ 1 week in less than 10 minutes! Go ahead and take the remaining time to watch anim...I mean be more productive!
π Read more
openpyxl documentation - A Python library to read/write Excel 2010 xlsx/xlsm files
A Guide to Excel Spreadsheets in Python With openpyxl by Pedro Pregueiro
Thank you for reading and have a nice day! π