Smarty

Address cleansing | What it is and how to do it

cleaning addresses using an API

Address cleansing is the collective process of standardizing, correcting, and then validating a postal address. Before an address can be validated, it must first be structured in the official postal format for the appropriate country, and any missing or incorrect information must be added or fixed.

Once the address is in the official postal format, with all the required information, it can be compared against the official address database for the country in question. In the United States, the official address database is managed by the USPS.

If the newly 'cleansed' address matches an address in the official database, it is determined to be a 'valid' address. Smarty provides an even more comprehensive US address database that includes 200+ million valid addresses. Smarty validates against our database and will also tell you if the address is found in the USPS database.

Ready to start cleaning up your dirty addresses? Smarty has some neat tools for cleansing, formatting, and validating address data. You can try them here:

US Address -
Cleanse addresses one at a time
International Address -
Cleanse international addresses one at a time
Bulk Address -
Cleanse a list of US or international addresses at once
Address APIs -
Programmatically cleanse US or international addresses

Table of contents

Using algorithms to clean addresses

When someone is looking for an algorithm to clean addresses, it's often because they are dealing with either a large Excel sheet of addresses to clean, or an entire address database. Cleaning that many addresses, one-at-a-time, would be ineffective and tedious. So, finding an algorithm to do the work programatically, just makes sense.

So, what kinds of algorithms are used?

Some people are tempted to try using regular expressions to clean up addresses. However, that approach is full of problems, and may actually make your job as a programmer more difficult.

In reality, cleaning an address requires the use of a number of different algorithms, each performing a related, though unique part of the address validation process. The algorithms being used must collectively:
  • Parse the address and break it into its individual components (ie. name, house number, street name, city name, state name, ZIP Code, etc.).
  • Standarize the data of each individual component so that it matches the format of the official postal database to be referenced.
  • Validate the now standardized address against the official address database.

The individual algorithms that are most effective in address cleaning are proprietary, and usually are part of an address validation company's software. This is true for both USPS and international addresses.

There just are not that many open-source algorithms that can effectively scrub addresses. However, a number of the dominant software solutions do have free usage options available.

Address cleansing tools

You'll find that address validation software usually features a number of different tools that can be used to scrub your addresses. And, each of these tools requires a different level of skill to use them effectively. Some of these tools are as simple as a "copy and paste" interface. Other tools require basic to advanced programming skills.

"Copy/paste" address cleaning tools

An example of a "copy/paste" tool is our Bulk Address Validation Tool. This type of tool is really helpful for individuals who have little to no programming skills, but still need to make sure that their list of addresses are standardized and validated.

As the name implies, you simply copy your list of addresses from your Excel spreadsheet, and paste it into the Bulk Address Validation Tool. Here are the steps involved:
  1. Select from "validate US addresses", "match ZIP Codes to US cities and states" or "validate international addresses"
  2. Paste your list (the one you copied from your Excel spreadsheet) into the section labeled "Paste your list below".
  3. Click on "Process My List". The software automatically cleans up the addresses, standardizes them, corrects or adds data as necessary, and then validates it against the official address database for the country in question.
  4. Copy the newly cleaned list and paste it back into your spreadsheet.

It really is that easy.

For individuals who are on the n00b side of the programmer scale, using a "copy/paste" tool like our bulk validation tool can save a lot of time and hassle.

Using an API for address cleansing

For individuals who have more solid developer skills, using an API to programmatically clean up addresses is probably the best route to take. While there are many different APIs out there, the Smarty collection of Address Validation APIs are crazy fast and easy to use. They only require a simple HTTP request. They send back cleaned address data with up to 55 metadata points, in a convenient JSON format. And, when properly configured, they can process up to 100,000 addresses a second.

Here is a list of all of the Smarty address validation API live demos:

Conclusion

If you're trying to clean a lot of addresses in a relatively short amount of time, your best option is to use some sort of address validation algorithm. The best algorithms are most often found in some form of proprietary software. Usually in this kind of software, there are usually a number of different address cleansing tools that you can choose from, depending on your level of programming skills. And, the best ones offer some form of free usage, especially while you're testing it out.