How to use Go's WaitGroup effectively

It all starts when one learns about the go keyword to spin up a goroutine:

package main

import (
	"net/http"
)

func main() {
	go doTheThing("http://a.testing.url.not.real")
}

func doTheThing(url string) {
	// do stuff
	resp, err := http.Get(url)
	if err != nil {
		panic(err)
	}

	// do something with the body
}
Why am I not getting any output? I know I should be getting a panic because the URL I passed in is clearly not a real URL. The code works fine if I remove the go keyword as well so I really don't understand whats happening

One scary scary word: concurrency

I won't go into details here but Go is a concurrent language with many (up to millions!) of goroutines running at once. The short explanation is that goroutines are not threads, they are lighter-weight than your standard OS thread. The runtime manages them rather than the operating system.

The problem observed above occurs when the main goroutine spins up a child routine, but then exits which kills all sub-routines. One way to fix this is to use a sync.WaitGroup (docs here: https://pkg.go.dev/sync#WaitGroup)


From now on I'm going to use my little svc-monitor project as an example. To explain what this project does it basically checks a bunch of URLs to see if they're up and if the request times out (which tends to happen in my setup for some reason) it uses the host and remotes to the node in the cloud to restart the systemd service.

Important WaitGroup Methods

sync.WaitGroup is a relatively simple struct, it only has 3 methods:

  • (*wg).Add(int) -> add a number to the waitgroup. This is a representation of "in flight" goroutines
  • (*wg).Done() -> Decrements the counter of "in flight" goroutines
  • (*wg).Wait() -> Block the current thread until the WaitGroup counter is 0

WaitGroup Flow

Using a WaitGroup follows a few steps:

  1. Instantiate the WaitGroup somewhere global (package level) OR pass it as an argument
  2. Add(int) to the counter as you spin up goroutines (any integer - so if you have an array you can add the length of the array of you are going to spin up a routine for each item)
  3. Call Done() at the end of the goroutine's life
  4. Block via Wait() at the end of the processing

Personally I tend to think of Add(int) as letting the outer-thread know about any new goroutines that are running and Wait() as saying to wait for them all. One problem that can happen though is if a sub-routine panics and the counter never gets decremented - this is a good time to use a select statement with a timeout. I'll write about that some other time.

Example Using WaitGroup

Leveraging this struct involves (usually) using all three of these methods. The steps I've followed in the example app are like this:

https://github.com/lindgrenj6/svc-monitor/blob/12ca6a9bfada22f8990cfa06e425550314795237/main.go#L30-L32

This section basically initializes the waitgroup with the number of services I'm going to check via (*wg).Add(len(svcs)). This allows me to fire off a goroutine for each service in parallel without worrying about the program exiting early.

From here, I tend to call (*wg).Done() at the end of a goroutine's life cycle. That happens in 2 places in this program: https://github.com/lindgrenj6/svc-monitor/blob/12ca6a9bfada22f8990cfa06e425550314795237/main.go#L70 and https://github.com/lindgrenj6/svc-monitor/blob/12ca6a9bfada22f8990cfa06e425550314795237/main.go#L90

Basically the counter will be decremented once that service is done being handled. This is when there are no problems OR when the service had to be bounced (and notifying me that it got bounced)

The final piece of the puzzle is the waiting part with (*wg).Wait() which I do here: https://github.com/lindgrenj6/svc-monitor/blob/12ca6a9bfada22f8990cfa06e425550314795237/main.go#L39

Which is probably the easiest part of this - since it's the last statement in the main goroutine and all it does is block until the counter is decremented to 0.

Jacob Lindgren

Jacob Lindgren

Nebraska, USA