Smarty
Scanning CSV in GoWouldn't it be nice if csv.Reader was more like bufio.Scanner?
Michael Whatcott
Michael Whatcott
 • 
January 5, 2018
Tags

For the purpose of this article, consider the following CSV data, slightly modified from the docs for encoding/csv:

csvData := strings.NewReader(strings.Join([]string{
	`first_name,last_name,username`,
	`"Rob","Pike",rob`,
	`Ken,Thompson,ken`,
	`"Robert","Griesemer","gri"`,
}, "\n"))

Here's how you read the data, line by line, using the Reader provided in that package:

reader := csv.NewReader(csvData)

for {
	record, err := reader.Read()
	if err == io.EOF {
		break
	}
	if err != nil {
		// handle the error...
		// break? continue? neither?
	}

	fmt.Println(record)
}

// Output:
// [first_name last_name username]
// [Rob Pike rob]
// [Ken Thompson ken]
// [Robert Griesemer gri]

There are a few awkward elements to this approach:

  1. We are checking for io.EOF each time around the loop.
  2. We are checking for a non-nil error each time around the loop.
  3. It's not clear what kind of non-nil errors might appear and what kind of handling logic the programmer should use in each case.

Generally, I expect CSV files to be well-formed and I break out of the read loop at the first sign of trouble. If that's also the approach you generally use, well, we've got an even more elegant way to read CSV data!

https://pkg.go.dev/github.com/smartystreets/scanners/csv

scanner := csv.NewScanner(csvData)

for scanner.Scan() {
	fmt.Println(scanner.Record())
}

if err := scanner.Error(); err != nil {
	log.Panic(err)
}

// Output:
// [first_name last_name username]
// [Rob Pike rob]
// [Ken Thompson ken]
// [Robert Griesemer gri]

This will look very familiar if you've ever used io/bufio.Scanner. No more cumbersome checks for io.EOF or errors in the body of the loop! By default, scanner.Scan() returns false at the first sign of an error from the underlying encoding/csv.Reader. So, how do you customize the behavior of the scanner you ask? What if the CSV data makes use of another character for the separater/delimiter/comma? Observe the use of variadic, functional configuration options accepted by csv.NewScanner:

csvDataCustom := strings.Join([]string{
	`first_name;last_name;username`, // ';' is the delimiter!
	`"Rob";"Pike";rob`,
	`# lines beginning with a # character are ignored`, // '#' is the comment character!
	`Ken;Thompson;ken`,
	`"Robert";"Griesemer";"gri"`,
}, "\n")

scanner := csv.NewScanner(csvDataCustom, 
	csv.Comma(';'), csv.Comment('#'), csv.ContinueOnError(true))

for scanner.Scan() {
	if err := scanner.Error(); err != nil {
		log.Panic(err)
	} else {
		fmt.Println(scanner.Record())
	}
}

// Output:
// [first_name last_name username]
// [Rob Pike rob]
// [Ken Thompson ken]
// [Robert Griesemer gri]

Pretty flexible, right? And notice, we still don't have to detect io.EOF, that happens internally and results in scanner.Scan() returning false.

Now, what if you are scanning the rows into struct values that have fields that mirror the CSV schema? Suppose we have a Contact type that mirrors our CSV schema...what's a nice way to encapsulate the translation from a CSV record to a Contact? Embed a *csv.Scanner in a ContactScanner and override the Record method to return an instance of the Contact struct rather than the []string record!

package main

import (
	"fmt"
	"io"
	"log"
	"strings"

	"github.com/smartystreets/scanners/csv"
)

type Contact struct {
	FirstName string
	LastName  string
	Username  string
}

type ContactScanner struct{ *csv.Scanner }

func NewContactScanner(reader io.Reader) *ContactScanner {
	inner := csv.NewScanner(reader)
	inner.Scan() // skip the header!
	return &ContactScanner{Scanner: inner}
}

func (this *ContactScanner) Record() Contact {
	fields := this.Scanner.Record()
	return Contact{
		FirstName: fields[0],
		LastName:  fields[1],
		Username:  fields[2],
	}
}

func main() {
	csvData := strings.NewReader(strings.Join([]string{
		`first_name,last_name,username`,
		`"Rob","Pike",rob`,
		`Ken,Thompson,ken`,
		`"Robert","Griesemer","gri"`,
	}, "\n"))

	scanner := NewContactScanner(csvData)

	for scanner.Scan() {
		fmt.Printf("%#v\n", scanner.Record())
	}

	if err := scanner.Error(); err != nil {
		log.Panic(err)
	}

	// Output:
	// main.Contact{FirstName:"Rob", LastName:"Pike", Username:"rob"}
	// main.Contact{FirstName:"Ken", LastName:"Thompson", Username:"ken"}
	// main.Contact{FirstName:"Robert", LastName:"Griesemer", Username:"gri"}
}

But we can go even further if you're not averse to using struct tags and reflection. Notice below that the StructScanner is able to populate a pointer to a struct whose fields are decorated with CSV struct tags corresponding with the header column names:

package main

import (
	"fmt"
	"log"
	"strings"

	"github.com/smartystreets/scanners/csv"
)

type Contact struct {
	FirstName string `csv:"first_name"`
	LastName  string `csv:"last_name"`
	Username  string `csv:"username"`
}

func main() {
	csvData := strings.NewReader(strings.Join([]string{
		`first_name,last_name,username`,
		`"Rob","Pike",rob`,
		`Ken,Thompson,ken`,
		`"Robert","Griesemer","gri"`,
	}, "\n"))

	scanner, err := csv.NewStructScanner(csvData)
	if err != nil {
		log.Panic(err)
	}

	for scanner.Scan() {
		var contact Contact
		if err := scanner.Populate(&contact); err != nil {
			log.Panic(err)
		}
		fmt.Printf("%#v\n", contact)
	}

	if err := scanner.Error(); err != nil {
		log.Panic(err)
	}

	// Output:
	// main.Contact{FirstName:"Rob", LastName:"Pike", Username:"rob"}
	// main.Contact{FirstName:"Ken", LastName:"Thompson", Username:"ken"}
	// main.Contact{FirstName:"Robert", LastName:"Griesemer", Username:"gri"}
}

Clearly, there are many ways to read a CSV file (including other nicely written packages). Happy (CSV) scanning!

go get -u github.com/smartystreets/scanners/csv

Source Code

Subscribe to our blog!
Learn more about RSS feeds here.
rss feed iconSubscribe Now
Read our recent posts
How Tech Companies Can Succeed in the 2024 Hiring Market
Arrow Icon
Throughout my many years of leading tech companies, I’ve immersed myself in the varying shifts in technology’s job market. However, the 2024 tech hiring market presents significant challenges for both companies and candidates, stemming from the rapid evolution of the industry, a shortage of skilled talent, workforce preferences for remote work options, increased competition, and a growing emphasis on cultural fit and soft skills. To navigate these challenges successfully, finding strategic approaches for employers is crucial.
Smarty Launches US GeoReference Data, Providing the Easiest, Most Accurate API Needed To Access Census Tract and Block
Arrow Icon
PROVO, Utah, April 10, 2024 – Smarty, the address data intelligence leader, announces today the launch of US GeoReference Data, a set of updates to Smarty's US Address Enrichment solutions. US GeoReference Data is a cloud-native solution that will allow organizations to append the geographic data found in U. S. Census Block and Tract information into accurately geocoded addresses.  Smarty's US GeoReference Data is the simplest and fastest way for organizations to access Census Blocks, Tracts, location names and statuses, as well as additional Census ID information relevant to a property.
International Be Kind to Lawyers Day
Arrow Icon
Lawyers get a bad rap. Lawyers have been around for a long time, and when you're this old, you're bound to collect your fair share of good and bad. It's true, ask your grandpa. We've got records of people described as "lawyers" going back to ancient Greece, Rome, and the Byzantines. These first individuals were folks who were asked to speak for the accused because those under scrutiny were—understandably so—shaken up by the situation. It went from someone who was your friend and did you a favor by speaking on your behalf to someone who knew all of the laws, and you'd hire them to speak eloquently for you.
Ready to get started?