Why would autocomplete suggest invalid addresses?
QuestionLast Updated: July 18, 2013
Why does autocomplete sometimes suggest addresses that don't validate?
The only significant part of any suggestion which may be invalid is the primary (house) number. Streets, cities, and states will always be verifiable combinations.
When we built autocomplete, we designed it not around validation, but around the user's intent. Autocomplete's main purpose is to help the user get their address done faster, not to validate their input before they've finished (which is rude).
We gave considerable thought to the mechanics and experience of autocomplete. With our perspective as an address verification company, where our specialty is addresses and our secret power is good user experience, our discussion about how to best offer autocomplete delayed its development for the better part of a year.
Originally, we wanted to show only valid address suggestions. We thought that would provide immense value to users and put us a notch above the competition. Turns out that the value we hoped for was overestimated and that there were easier, more effective ways to have a better autocomplete than other providers.
How did we overestimate the value of only-valid suggestions? A couple of things. If an address suggestion is "valid," does that mean it exists, or it is what the user intended? (Remember, we're speaking of suggestions here.) For instance, 123 Main and 124 Main may both be valid, but if the user mistyped 123 instead of 124, there's no value gained here, because it still shows as "valid."
Further, we were for some reason thinking that users wouldn't know the address they wanted to type! In reality, even if they don't know the end of it, they surely know the beginning, like the house number and the start of the street name. (They better, because I know of no web service which can read minds. We have to rely on the user to get the house number right.)
Instead of trying to validate their input before they've finished and restricting their input to a list of options, we focused on helping the user finalize their intent faster: just suggest the streets, cities, and states that match their input. Let the final check happen at verify-time when they're done.
Suggesting streets, cities, and states, instead of full and verified addresses also makes our service more flexible and allows us to expand features and data points in the future. Autocomplete excels at its primary purpose: help the users enter their addresses faster.
So really, a large part of the reason is for a positive user experience. However, there were other business and technical aspects we thoroughly considered. Please note that none of the following reasons were seen as total showstoppers; these are all solvable. It was the negative impact on user experience that ultimately drove our decision. We know ways around these next problems, but as we shifted our UX focus, these problems immediately dissolved.
As we explored options, the technical limitations ran us into new business dilemmas.
The USPS database is big: several gigabytes on disk. And unfortunately, the proprietary format of that database which works great for verifying addresses absolutely does not fit the model of an autocomplete lookup. A new index would have to be generated to allow for speedy retrieval of address suggestions. We built this index on a small subset of addresses in a familiar region (about a dozen ZIP Codes), and when we piled it into a database, we were 35 GB shorter on available disk space than we were before. Yikes! Worse yet, lookups were still slower than we wanted, even after some optimizations.
As we were discovering, having an index with well over 300,000,000 points in our database was going to require some heavy lifting, which we weren't sure we wanted to do. If we did do it, our customers would have to pay for it, which is where business decisions came in.
We entertained the idea of autocomplete being an additional paid service. Since each suggestion would be a valid address, our customers would pay for each suggestion on every keystroke for each one of their users. Tallying up the costs, we ran out whiteboard space. That obviously was too expensive, especially since the user would only end up choosing one address in the end.
Not yet defeated, we went down the road of adjusting our pricing model just for autocomplete. That got complicated quickly, and still meant that we were hashing out valid and complete addresses en-masse without most of them seeing any use.
This led us back to the idea of free autocomplete. If we did this, then any keystroke would generate up to 10 verified addresses for free, undermining the value of paid LiveAddress subscriptions. This didn't seem quite right, and we didn't want to hurt our customers, especially if we found the service abused over time. We also ran back into the problem of infrastructure & hardware costs that weren't being covered anymore. Back to square 1...
The most logical solution came about a year after we first wanted to do autocomplete. By shifting our UX perspective (described earlier), we eliminated the need for a list of every possible valid address that a user might type. By doing this, the index size came down dramatically and so did our costs. With this, we can provide a great autocomplete service for free to our customers -- and we do it without a database.
We still know of no other provider that offers address autocomplete like ours:
- US addresses, using official data
- Super-fast, usually under 100 or 150 ms response times (with external latency)
- Builds into any existing form field (using our jQuery plugin)
- IP geolocation puts local addresses to the top of the suggestions list
- City/state filters
- City/state or state bias
- No database (no overhead associated with disk I/O, queries, etc.)
See this comment for a concise summary of some of the things mentioned here.
What about ZIP Codes?We don't currently show ZIP Codes in suggestions for these main reasons:
- ZIP Codes slow down the user with extra information they have to comprehend at every keystroke
- ZIP Codes overlap cities/states in one-to-many and many-to-one relationships, essentially requiring verifying the address to get the correct ZIP Code for that part of the street
- Only a city/state is necessary to verify addresses. By omitting ZIP Code, we encourage good practice of verifying the address, which then appends the ZIP Code.