Overview

You may have come across situations where you need to identify duplicate lines contained in a text file. I recently had to do this within a large CSV data extract file.

While there are several ways to do this, I was specifically looking for a way to do this using Notepad++, which I already had installed. So in this blog article, I’ll show you how to find duplicate lines using Notepad++.

Notepad++ logo

Solution

Notepad++ is a popular text editor that provides various features for programmers, including the ability to search for and replace text using regular expressions. To use Notepad++ to detect duplicate lines, follow these steps:

Step 1: Open the file in Notepad++
Open the existing file that you want to search for duplicates within Notepad++.

Step 2: Open the “Find” dialogue box
Open the “Find” dialogue box, either by pressing Ctrl+F or navigate to Search>Find in the menu bar.

Step 3: Enter the regular expression
In the “Find” dialogue box, enter the regular expression ^(.\r?\n)\1+ – in the “Find what” field.

This regular expression uses two capture groups to match and identify duplicate lines. The first capture group (.\r?\n) matches any characters followed by a line break, and the second capture group (\1+) matches the first capture group one or more times.

Notepad++ Find Dialog Screenshot

Step 4: Enable the regular expression search
To enable the regular expression search, select the “Regular expression” option in the “Search mode” section of the “Find” dialogue box.

Step 5: Search for duplicates
Click the “Find All” button to search for duplicate lines in your document. Notepad++ will highlight all the duplicate lines in your code, making it easy to identify and remove them.

Final Thoughts

I hope you’ve found this article to be useful. If you have found any other tips then feel free to share them below in the comments section to help others out there.

Shane Bartholomeusz