Finding duplicates in Excel is a very common task when dealing with large sets of data. Excel has some very easy methods to remove or highlight duplicates. You can also use some functions to help you identify duplicates in your data.
Case Study.
Let’s assume we have a list of the best movies in the decade of 2000s and we want to get a list of the Studios that produced them. The original list looks like this.
As you can see, there are many duplicates in the list of studios. Just Warner Bros. Pictures alone had many movies in the top 20 list.
Highlight Duplicates in Excel
In order to see the duplicates in Excel, select the column that contains the duplicate values, go to the Home and click for Conditional Formatting and click on highlight duplicate values
Once the Duplicate Values menu is open, you can choose between highlighting the duplicates or the unique values.
Removing Duplicates in Excel
In order to remove the duplicates in the table, select all the columns in the table, go to the Data tab and click on Remove Duplicates.
In the next step, select the columns that contain duplicate values.
WARNING: Excel will remove the data in the other columns when duplicate rows are removed. Duplicate rows will be removed based on the column select.
If you select the Studios column, only one row will remain for each unique value in the Studios column.
Using the IF formula to find duplicates in Excel
Another method to identify duplicates is to use an IF formula. In order to use this method, the list has to be filtered by the values in the column that contain duplicates.
In column F you will enter the following formula and copy it down.
=IF(C2=C1,"Yes","No")
Notice how the first value shows that is not a duplicate but all the next duplicate values show as a “Yes”, duplicate. The formula compared the value of the cell in the same row from column C with the value in the row above. If the values are the same, a “Yes” is the result of the formula.
The advantage with this method is that the column F can be used to filter the unique value without loosing the duplicate values.
Go to the Home tab, click on Sort& Filter and then Filter.
Open the filter on the Duplicate column and select only the “No” value.
The result is identical with the Remove Duplicates method above. However, the advantage is that using the filter, you can bring the duplicate rows back any time.