Address cleansing | What it is and how to do it

Address cleansing is the collective process of standardizing, correcting, and then validating a postal address. Before an address can be validated, it must first be structured in the official postal format for the appropriate country, and any missing or incorrect information must be added or fixed.
Once the address is in the official postal format, with all the required information, it can be compared against the official address database for the country in question. In the United States, the official address database is managed by the USPS.
If the newly 'cleansed' address matches an address in the official database, it is determined to be a 'valid' address. Smarty provides an even more comprehensive US address database that includes 200+ million valid addresses. Smarty validates against our database and will also tell you if the address is found in the USPS database.
Ready to start cleaning up your dirty addresses? Smarty has some neat tools for cleansing, formatting, and validating address data. You can try them here:
Table of contents:
- Algorithms used to clean addresses
- Address cleansing tools
- Using an API for address cleansing
- Try address validation APIs
- Conclusion
Using algorithms to clean addresses
When someone is looking for an algorithm to clean addresses, it's often because they are dealing with either a large Excel sheet of addresses to clean, or an entire address database. Cleaning that many addresses, one-at-a-time, would be ineffective and tedious. So, finding an algorithm to do the work programatically, just makes sense.
So, what kinds of algorithms are used?
Some people are tempted to try using regular expressions to clean up addresses. However, that approach is full of problems, and may actually make your job as a programmer more difficult.
In reality, cleaning an address requires the use of a number of different algorithms, each performing a related, though unique part of the address validation process. The algorithms being used must collectively:
- Parse the address and break it into its individual components (ie. name, house number, street name, city name, state name, ZIP Code, etc.).
- Standardize the data of each individual component so that it matches the format of the official postal database to be referenced.
- Validate the now standardized address against the official address database.
The individual algorithms that are most effective in address cleaning are proprietary, and usually are part of an address validation company's software. This is true for both USPS and international addresses.
There just are not that many open-source algorithms that can effectively scrub addresses. However, a number of the dominant software solutions do have free usage options available.
Address cleansing tools
Address validation software typically includes multiple tools designed to scrub, standardize, and validate address data. Each one requires a different level of technical skill. Some are as simple as uploading a file, while others require basic to advanced programming experience.
“Upload” address cleansing tools
A good example of an upload-based solution is our Bulk Address Validation Tool. This is ideal for teams and individuals with little to no programming experience who still need accurate, standardized address data.
As the name suggests, you upload a file—usually from Excel or Google Sheets—and let the software do the heavy lifting. Here are the basics. If you want comprehensive instructions, check out the docs:
- On the Bulk Address Validation Tool page, click “Download sample lists” and choose the appropriate format:
- US sample list
- International sample list
- ZIP Code sample list
- Open the downloaded file, delete the sample data below the headers, keeping them intact.
- Copy the address components from your spreadsheet and paste them under the corresponding headers. Save the file.
- Return to the Bulk Address Validation Tool page and upload your newly formatted file.
- Select whether the addresses are US-based or international.
- Select whether you want to validate full addresses or only city, state, and ZIP Code combinations.
- Click “Process 10 Records” to test the results.
(The full version can process up to 500,000 addresses at once. Our 42-day free trial will allow you to process 1,000 addresses on us.)
Once submitted, Smarty automatically:
- Standardizes address formatting
- Corrects or appends missing data where possible
- Validates each address against the authoritative postal database for the selected country
When processing is complete, click “Download results in CSV” to get your cleaned file.
That’s it.
For folks on the noob side of the programming spectrum, upload-based tools like bulk address validation eliminate a ton of manual work and reduce the risk of introducing bad data—no code required.
Using an API for address cleansing
For individuals who have more solid developer skills, using an API to programmatically clean up addresses is probably the best route to take. While there are many different APIs out there, the Smarty collection of Address Validation APIs are crazy fast and easy to use. They only require a simple HTTP request. They send back cleaned address data with up to 55 metadata points, in a convenient JSON format. And, when properly configured, they can process up to 75,000 addresses a second.
Here is a list of all of the Smarty address validation APIs you can try:
- US Street Address API: Validate USPS addresses
- International Street Address API: Validate addresses in 250 Countries
- US ZIP Code API: Look up and verify city, state, and ZIP Code combinations
- US Autocomplete Pro API: Suggest cleansed, validated addresses to users in real-time
- US Extract API: Extract address data from any text
Conclusion
If you're trying to clean a lot of addresses in a relatively short amount of time, your best option is to use some sort of address validation algorithm. The best algorithms are most often found in some form of proprietary software. Usually in this kind of software, there are usually a number of different address cleansing tools that you can choose from, depending on your level of programming skills. And, the best ones offer some form of free usage, especially while you're testing it out.