New 42-day free trial
Smarty

Scanning CSV in Go

Michael Whatcott
Michael Whatcott
 | 
May 5, 2018
Tags
Smarty header pin graphic

For the purpose of this article, consider the following CSV data, slightly modified from the docs for encoding/csv:

csvData := strings.NewReader(strings.Join([]string{
	`first_name,last_name,username`,
	`"Rob","Pike",rob`,
	`Ken,Thompson,ken`,
	`"Robert","Griesemer","gri"`,
}, "\n"))

Here's how you read the data, line by line, using the Reader provided in that package:

reader := csv.NewReader(csvData)

for {
	record, err := reader.Read()
	if err == io.EOF {
		break
	}
	if err != nil {
		// handle the error...
		// break? continue? neither?
	}

	fmt.Println(record)
}

// Output:
// [first_name last_name username]
// [Rob Pike rob]
// [Ken Thompson ken]
// [Robert Griesemer gri]

There are a few awkward elements to this approach:

  1. We are checking for io.EOF each time around the loop.
  2. We are checking for a non-nil error each time around the loop.
  3. It's not clear what kind of non-nil errors might appear and what kind of handling logic the programmer should use in each case.

Generally, I expect CSV files to be well-formed and I break out of the read loop at the first sign of trouble. If that's also the approach you generally use, well, we've got an even more elegant way to read CSV data!

https://pkg.go.dev/github.com/smartystreets/scanners/csv

scanner := csv.NewScanner(csvData)

for scanner.Scan() {
	fmt.Println(scanner.Record())
}

if err := scanner.Error(); err != nil {
	log.Panic(err)
}

// Output:
// [first_name last_name username]
// [Rob Pike rob]
// [Ken Thompson ken]
// [Robert Griesemer gri]

This will look very familiar if you've ever used io/bufio.Scanner. No more cumbersome checks for io.EOF or errors in the body of the loop! By default, scanner.Scan() returns false at the first sign of an error from the underlying encoding/csv.Reader. So, how do you customize the behavior of the scanner you ask? What if the CSV data makes use of another character for the separater/delimiter/comma? Observe the use of variadic, functional configuration options accepted by csv.NewScanner:

csvDataCustom := strings.Join([]string{
	`first_name;last_name;username`, // ';' is the delimiter!
	`"Rob";"Pike";rob`,
	`# lines beginning with a # character are ignored`, // '#' is the comment character!
	`Ken;Thompson;ken`,
	`"Robert";"Griesemer";"gri"`,
}, "\n")

scanner := csv.NewScanner(csvDataCustom, 
	csv.Comma(';'), csv.Comment('#'), csv.ContinueOnError(true))

for scanner.Scan() {
	if err := scanner.Error(); err != nil {
		log.Panic(err)
	} else {
		fmt.Println(scanner.Record())
	}
}

// Output:
// [first_name last_name username]
// [Rob Pike rob]
// [Ken Thompson ken]
// [Robert Griesemer gri]

Pretty flexible, right? And notice, we still don't have to detect io.EOF, that happens internally and results in scanner.Scan() returning false.

Now, what if you are scanning the rows into struct values that have fields that mirror the CSV schema? Suppose we have a Contact type that mirrors our CSV schema...what's a nice way to encapsulate the translation from a CSV record to a Contact? Embed a *csv.Scanner in a ContactScanner and override the Record method to return an instance of the Contact struct rather than the []string record!

package main

import (
	"fmt"
	"io"
	"log"
	"strings"

	"github.com/smartystreets/scanners/csv"
)

type Contact struct {
	FirstName string
	LastName  string
	Username  string
}

type ContactScanner struct{ *csv.Scanner }

func NewContactScanner(reader io.Reader) *ContactScanner {
	inner := csv.NewScanner(reader)
	inner.Scan() // skip the header!
	return &ContactScanner{Scanner: inner}
}

func (this *ContactScanner) Record() Contact {
	fields := this.Scanner.Record()
	return Contact{
		FirstName: fields[0],
		LastName:  fields[1],
		Username:  fields[2],
	}
}

func main() {
	csvData := strings.NewReader(strings.Join([]string{
		`first_name,last_name,username`,
		`"Rob","Pike",rob`,
		`Ken,Thompson,ken`,
		`"Robert","Griesemer","gri"`,
	}, "\n"))

	scanner := NewContactScanner(csvData)

	for scanner.Scan() {
		fmt.Printf("%#v\n", scanner.Record())
	}

	if err := scanner.Error(); err != nil {
		log.Panic(err)
	}

	// Output:
	// main.Contact{FirstName:"Rob", LastName:"Pike", Username:"rob"}
	// main.Contact{FirstName:"Ken", LastName:"Thompson", Username:"ken"}
	// main.Contact{FirstName:"Robert", LastName:"Griesemer", Username:"gri"}
}

But we can go even further if you're not averse to using struct tags and reflection. Notice below that the StructScanner is able to populate a pointer to a struct whose fields are decorated with CSV struct tags corresponding with the header column names:

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/smartystreets/scanners/csv"
)

type Contact struct {
	FirstName string `csv:"first_name"`
	LastName  string `csv:"last_name"`
	Username  string `csv:"username"`
}

func main() {
	csvData := strings.NewReader(strings.Join([]string{
		`first_name,last_name,username`,
		`"Rob","Pike",rob`,
		`Ken,Thompson,ken`,
		`"Robert","Griesemer","gri"`,
	}, "\n"))

	scanner, err := csv.NewStructScanner(csvData)
	if err != nil {
		log.Panic(err)
	}

	for scanner.Scan() {
		var contact Contact
		if err := scanner.Populate(&contact); err != nil {
			log.Panic(err)
		}
		fmt.Printf("%#v\n", contact)
	}

	if err := scanner.Error(); err != nil {
		log.Panic(err)
	}

	// Output:
	// main.Contact{FirstName:"Rob", LastName:"Pike", Username:"rob"}
	// main.Contact{FirstName:"Ken", LastName:"Thompson", Username:"ken"}
	// main.Contact{FirstName:"Robert", LastName:"Griesemer", Username:"gri"}
}

Clearly, there are many ways to read a CSV file (including other nicely written packages). Happy (CSV) scanning!

go get -u github.com/smartystreets/scanners/csv

Source Code

Subscribe to our blog!
Learn more about RSS feeds here.
rss feed icon
Subscribe Now
Read our recent posts
Improving user/customer experience in every industry with clean address data
Arrow Icon
You finally track down an essential addition to your collector’s set of [insert item of your choice], and you're hyped to buy it until the chaos begins. The cart is hidden in a fly-out on the side, cluttered with blocky, overwhelming text. You spend way too long just trying to find the "Proceed to Checkout" button. 👎 That’s bad UI (user interface): messy, confusing design that makes navigation a chore. You make it to the checkout and start entering your info, but the site keeps rejecting your address.
Dashboard essentials for Smarty users
Arrow Icon
The Smarty dashboard is your central hub for managing address verification, geocoding, and property data services. Whether you're just starting or looking to optimize your current setup, understanding the dashboard's full capabilities can significantly streamline your address data operations. We recently held a webinar in which we reviewed all of the Smarty dashboard's items and features. Missed it? That's OK; we've got all the information right here. You can expect to read about:Accessing your dashboardSetting up your account for successUnderstanding your active subscriptionsManaging API keys effectivelyStreamlining billing and financial managementStaying informed with smart notificationsTeam management and access controlsWeb toolsMaking the most of free trialsKey takeawaysLet’s get going!Accessing your dashboardGetting to your dashboard is straightforward.
Take charge of your API usage with Smarty’s key management features
Arrow Icon
Ever wondered, “Where did all my lookups go?!” Without proper API management, you may burn through your lookups quicker, experience runaway code, and encounter unexpected usage. That’s why Smarty created usage by key (included in all annual plans) and limit by key (included in some plans; you can add them by contacting sales) for its APIs. Why key management mattersCommon API usage challenges (problems to solve):Unexpected spikes in lookupsDifficulty tracking specific key usageWhich keys are calling which Smarty licenseNeed for better control over API consumptionDifficulty allocating Smarty lookups across an organizationWith Smarty's key management features, you gain more control by having better visibility of your usage, eliminating the element of surprise (you know, the bad kind, like when you’re suddenly out of lookups).

Ready to get started?