Bulk Address Validation: Command-line interface

Reading not your thing? Watch our quick-start video instead (Windows | Mac).

If you have large lists of addresses to process, and you have some experience with the command line, our Command-Line Interface (CLI) might become your new best friend. It can process millions of either US or international (non-US) addresses very quickly. Each address processed will count as one "lookup" from your US or international subscription. (If you're not yet familiar with the command line, try our Web Interface. It can process up to 500,000 US or international addresses at once.)

On scripting and automation
Download
Installation
Preparing your input file
Using the interface
The output file
The log file
Command-line parameters
Updates
Troubleshooting

An important note on scripting and automation

This Command-Line Interface tool is provided as a convenience for (mostly) non-computer programmers seeking to process large quantities of addresses formatted as CSV or PSV records. It is intended that it will be invoked manually by human users typing at a command prompt (not the most friendly of user experiences, we get it). The use-case of deploying this tool into an automated environment for the processing of ad-hoc address data is not supported. This constraint is based on how we provide software updates for this tool. If you need to process address data from deployed software running autonomously we recommend our officially supported SDKs. For those who seek an even more direct HTTP integration we also provide detailed US Street Address API documentation.

Download

You can download (free) the Command-Line Interface for the following platforms:

You're welcome!

Installation

After downloading one of the above packages, extract the contents of the archive to your desktop. You'll see a SmartyList folder containing the following files:

smartylist This is the application. Instead of double-clicking it, you will access it from the command line.
sample-input.csv This is a simple address list for your reference.
sample-output.csv This is the output produced by processing the sample-input.csv file above.
change-log.txt A log of recent changes made by the software developers.
DO-NOT-README.txt Actually, please read it.

Power users: Feel free to copy or move smartylist to wherever is convenient. On a Linux machine you might put it in /usr/local/bin or somewhere else that is already in your $PATH.

Preparing your input file

Save your input data as a CSV or PSV file (comma-separated-values or pipe-separated-values), within the SmartyList folder on your desktop. Within that file, have your data organized into columns using one of the combinations shown below. (The more data provided, the better.) The top row MUST consist of field names, spelled exactly as you see here.

US Addresses

street	city	state	zipcode
11310 Old Seward Highway	Anchorage	AK	99515
3211 Edwards Lake Pkwy	Birmingham	AL
11219 N Rodney Parham Road			72212
4507 North US Highway 89	Flagstaff	AZ	86004

International Addresses

For international addresses, use one of these combinations of columns:
country | address1 | locality | administrative_area | postal_code
country | address1 | locality | administrative_area
country | address1 | postal_code
country | freeform (entire address except country in a single column)

Important: To return geocodes (latitude/longitude) in the response for a particular address, provide a column named geocode with a value of true. Alternatively, you could use the -geocode command-line parameter to return geocodes for the entire batch.

country	address1	locality	administrative_area	postal_code	geocode
AUS	200 River Terrace	Kangaroo Point	Queensland	4169	true
DEU	Hainichener Strasse 64	Freiberg	Sachsen		true
PYF	21 Allée Pierre Loti	Papeete		98714	false
RUS	ул. Фурштатская, д. 13			191028	true
JPN	きみ野 6-1-8	大和市	神奈川県	242-0001	false

For either US or international addresses, you can include fields that contain non-address data like ID number or business name. All your input data will be returned untouched as part of the output. The column names for non-address data should be something that will not conflict with address data column names.

US Enrichment Data

For US Enrichment use one of these combinations of columns depending on if you are performing a query by SmartyKey or searching by address:
smarty_key
street or freeform (search by address with the full address in a single column)
street | city | state | zipcode (search by address with components)

etag may be used with any of the above columns to check if the record has been updated. If the etag value has not changed, an empty record will be returned. (See US Enrichment documentation for more information)

Important:

Property financial data has limited support and will only return the base property data.
Secondary data has limited support and will only return the root address and secondary count.

Consider the following examples of US Enrichment input files:

smarty_key
123456
7891011

smarty_key	etag
123456	AAHQCAIDAYBQIAQC
7891011	AIAAQBAHBAAQ6DB

freeform
1400 Sandhill Dr Orem UT
123 N Pole Dr Outback AK 90123

street	city	state	zipcode
1400 Sandhill Dr	Orem	UT
123 N Pole Dr	Outback	AK	90123

Final Consideration

Make sure your list doesn't include blank lines (except at the end). By "blank lines" we mean lines that have no delimiters (commas, tabs, or pipes) and no data except a carriage return character (and/or line feed character). Blank lines can cause line numbers to output incorrectly, which makes pasting back into a spreadsheet a bit tricky. If you insist on having blank lines, make sure each record has an 'ID' field containing a unique value.

Using the interface

Open your favorite command-line application, and use the "change directory" command to navigate to the directory where your Command Line Interface files reside. On Windows, we recommend running this command as an administrator. This is what that might look like:

Windows:

cd /Users/[username]/Desktop/smartylist_windows_latest

Mac:

cd ~/Desktop/smartylist_osx_latest

Three specific command-line parameters are required in order to process a list: -auth-id, -auth-token, and -input. (To find your -auth-id and -auth-token, open the API Keys tab of your account and look under the heading of Secret Keys.) The -input parameter tells the tool where your input file is. If you placed your input file inside of the SmartyList folder, the complete command to process it might look like this:

Windows:

smartylist -auth-id="123" -auth-token="Abc" -input="your_file"

Mac:

./smartylist -auth-id="123" -auth-token="Abc" -input="your_file"

We suggest you try a short list first, to make sure everything is working as expected. When you run the command, the terminal will first display your current configuration settings, so you can verify that they are as desired. It will also list your input field names, and below those, the matching data type for each. Make sure these are correct.

Finally, the prompt will ask if everything appears to be in order. If everything looks right, type "y" then hit "enter." During processing, the terminal will display a progress bar. (Although, if your list is small, the job will be done almost instantly.)

To run a file through a particular API, use the -api parameter. See parameters.

The output file

By default, the output file will be placed next to the input file, and it will be named like the input file, except with "-output" appended. (If you wish, you can specify a different output directory using the -output command-line parameter.)

When viewing the output file, you will see all of your original data fields on the left, followed by an empty field, followed by our output fields on the right, with field names in brackets.

The CLI output fields for US addresses are very similar to the raw output from the US Street Address API, though in a different order. For an explanation of the US output fields, please see Address Output Fields.

The CLI output fields for international addresses are likewise very similar to the raw output from the International Street Address API with these differences:

There is one new field in the CLI output for international addresses: line_number. This is simply a numbering of rows, to help keep track of their original output order.

The CLI output fields for US Enrichment are similar to the raw output from the API with the following differences.

There is one new field in the CLI output: line_number. This is simply a numbering of rows, to help keep track of their original output order.
Property financial data has limited support and will only return the base property data.
Secondary data has limited support and will only return the root address and secondary count.

The log file

Every time you process a list with the Command-Line Interface, it will produce a log file and place it next to the corresponding input file. The name of the log file will follow this pattern:

[name-of-input-file]-log_[date-time]

The file will contain all the information displayed by the terminal before processing, as well as a precise play-by-play of the tool's various actions. In the unlikely event that your list fails to process, check the log file for the gory details of what happened. If you contact Support with questions, they may ask to see this file in order to aid in the debugging process.

Command-line parameters

Here we list all the command-line parameters that can be used with our Command-Line Interface. As explained above, the first three parameters listed below are all that are required to process a list. The others are optional; you can employ them to customize the tool's functionality. To use them, simply list them when you run smartylist at the command prompt, following this model:

smartylist -[parameter] -[another-parameter]

-auth-id="123"
The auth-id value (or name of environment variable) to use for API requests.
-auth-token="Abc"
The auth-token value (or name of environment variable) to use for API requests.
-input="path/to/the/input/file"
The path to the input file which has addresses you want to validate.
-output="/path/to/the/output/file"
If provided, this is where bulk validation tool will place the output file containing the results of processing your input file. If not provided, the tool will place the output alongside the input.
-log="path/to/the/log/file"
If desired, you can tell the bulk validation tool where to put the diagnostic log file. If this parameter is not provided, the tool will place the log file alongside the input file.
-api="name-of-api"
Valid values are "us-street", "international-street", and "us-enrichment". If this parameter is not provided, "us-street" will be assumed by default. If an invalid value is provided, an error will be thrown, and the process will not run.
-license="name-of-license"
Use this parameter to specify the license to use for the chosen input file. Valid values can be found in your subscriptions page, under the appropriate subscription.
-base-url="http://www.your-site.com"
The base URL to use for API requests if you are pointing to an onsite API installation. If you are using our regular cloud service, this parameter is not necessary.
-format="format-value"
This parameter should only be used when processing US addresses. When you provide this parameter, the tool will override the default output format. Valid values are the same as the format parameter for the US Street Address API. If you would like to set formatting to the Project USA Format, we recommend you set format="project-usa”.
-match="match-value"
This parameter should only be used when processing US addresses. When you provide this parameter, the tool will override any values in the match column of the input file. Valid values are the same as the match parameter for the US Street Address API. If you are using one of our newer "Core" licenses, we highly recommend you set match="enhanced".
-county-source="county-source-value"
This parameter should only be used when processing US addresses. When you provide this parameter, the tool will override any values in the match column of the input file. Valid values are the same as the county_source parameter for the US Street Address API.
-enrichment-dataset=”name-of-enrichment-dataset”
This parameter should only be used with the us-enrichment API to target a specific enrichment dataset. Example value: property
-enrichment-data-subset=”name-of-enrichment-data-subset”
This parameter should only be used with the us-enrichment API and the enrichment dataset contains subsets. Example value: principal
-enrichment-include=”comma separated property groups or attribute names”
This parameter is used to include only specific property groups or attributes in the output file.
It is only applicable with the us-enrichment API and the property dataset and principal data subset. Example value: group_location,assessed_value
See the US Address Enrichment API documentation for usage details.
-enrichment-exclude=”comma separated property groups or attribute names”
This parameter is used to exclude specific property groups or attributes from the output file.
It is only applicable with the us-enrichment API and the property dataset and principal data subset. Example value: tax_assess_year
See the US Address Enrichment API documentation for usage details.
-geocode
This parameter will return geocodes for International Street queries. Otherwise, the geocodes (lat/lon) columns will have a value of 0.
-rate-limit=[integer]
With this command-line parameter, you can choose how fast to send addresses to the API, in addresses per second. For example: -rate-limit=300 will cause the CLI to send 300 addresses per second. Valid values are positive integers. If a rate-limit value of less than 1 or non-integer is given, an error will be thrown, and the CLI process will not run. If this parameter is not used, no rate limit will be applied.
-proxy="www.your-proxy.com"
The URL of your proxy, if one has been configured for your network. In most cases this flag is not necessary.
-silent
Tells the tool to squelch all diagnostic output and process the list without a confirmation prompt if possible. (No value needed.)
-timeout
If your network connection is slow you may receive timeout errors during execution such as context deadline exceeded (Client.Timeout or context cancellation while reading body). This parameter can help prevent those errors by allowing more time for the response to be received from the server. The default value is 5 (5 seconds). (No value needed.)
-version
When you provide this parameter, the tool simply prints the version of the application to stdout and exits. (No value needed.)

Updates (Pay attention, this is important!)

Try this command at the command prompt:

smartylist -version

(Mac/Linux users may need to insert ./ in front of the word smartylist.)

If the latest version number doesn't match what you see, you might be missing out on recent improvements and should probably download and install the latest version.

The version number you see is the semantic version number of your copy of the application. Each of the three dot-delimited numbers is significant: major.minor.patch

The first of the three numbers in the version output is the "major" version number. If we need to release a new major version, any copies of the old version will be automatically disabled, requiring you to download and install the latest version before processing any additional lists. (Read that last sentence again...slowly...just to make sure it sinks in.) This is not something we will do often and certainly not ever without extensive consideration.

The second of the three numbers in the version output is the "minor" version number. Incrementing this number means we have released new functionality that is still backwards-compatible. It would behoove you to download and install the latest version. Until you do, a message will be sent to stderr and a non-zero exit status will be returned by the application as a signal that something is amiss. The application will continue to process your lists.

This third number refers to patches and bug fixes—corrections to existing behavior. If this number doesn't match, it would be a good idea (probably worth a promotion!) for you to download and install the latest version so you have the most current and correct software. In this situation, a message will be sent to stdout as a signal that something is amiss. The application will continue to process your lists.

New releases are announced in our open-source Changelog repository.

Troubleshooting

If Excel doesn’t display some characters correctly

The Smarty CLI outputs a comma-delimited or pipe-delimited file with correct characters, in many languages, including all characters in the UTF-8 character set. If Excel is not displaying some characters correctly, we recommend this procedure:

Instead of opening the output file directly with Excel (e.g., by double-clicking on the file), open Excel and open a brand new, empty file.
From within the Excel application, go to the File menu and choose Import.
During the Import process, choose the comma-delimited or pipe-delimited file that was output by the Smarty CLI.
Also during the Import process, be sure to tell Excel that the "File origin" or character set you want to use is "Unicode (UTF-8)."
Finally, you will be given the opportunity to set the file delimiters. Choose the one that makes the preview look right (probably either comma or pipe).

Bulk Address Validation: Command-line interface

Contents

An important note on scripting and automation

Download

Installation

Preparing your input file

US Addresses

International Addresses

US Enrichment Data

Final Consideration

Using the interface

The output file

The log file

Command-line parameters

Updates (Pay attention, this is important!)

Troubleshooting

4 more ways to validate in bulk

Ready to get started?

Products

Solutions

Developer Hub

Library

Company