Written 10th of June 2021.
I've found a really nice way to speed up Postgres database integration tests considerably. It gave me a 3x speedup. Before, I ran database migrations setup and teardown before and after every integration test, using a migrate tool and a long-running Docker database container.
The solution involves using the Postgres template database functionality. The code examples in this post are going to be in Go, but this method applies to any other language where you're using integration tests with Postgres.
In a large project I'm currently working on, we used to use golang-migrate to setup and teardown database tables before and after every integration test. This makes it really easy to make sure that you have a clean database state before every test run, to ensure test isolation.
However, it wasn't free: the setup of our 30+ table migration files took around 100 ms, and about the same for the teardown, adding about a fifth of a second for every integration test. This quickly adds up, and we ended in a situation where testing our main storage package on our beefed up development machines took 30 seconds, and much more on the loaded CI machines. This doesn't sound like much, but when you like to test early and often, it quickly becomes a nuisance.
It's just very nice to do things like testing quickly, especially when you're used to having the rest of the Go toolchain be so fast about everything. There's just something about tool speed that's satisfying.
So I researched other approaches on doing this. Perhaps it would be faster to somehow merge the migrations? Run the database from RAM? Turn off fsync for Postgres or some other obscure startup flag? Some of these helped a little, but not as much as I had hoped and anticipated. Especially running the database from a RAMdisk didn't speed up the tests as much as I had thought.
But then I stumbled upon the Postgres feature of template databases. Basically, every new Postgres database is implicitly copied from a template database (called template1
). Any changes you make to template1
before creating a new database is then copied to that database. And importantly, this copy is a cheap filesystem operation. Sounds perfect!
I usually have a testing helper function to setup and teardown my databases. It's called CreateDatabase
, and it does something like this:
template1
and run the migrations with golang-migrate.testdb
) if it exists, and then recreate it. This last query implicitly copies from template1
.In code, it looks something like this (with some details left out for brevity):
package integrationtest
import (
"database/sql"
"sync"
"time"
_ "github.com/jackc/pgx/v4/stdlib"
)
var initDatabaseOnce sync.Once
// CreateDatabase for testing.
// Usage:
// db, cleanup := integrationtest.CreateDatabase()
// defer cleanup()
func CreateDatabase() (*sql.DB, func()) {
initDatabaseOnce.Do(initDatabase)
db, cleanup := connect("postgres")
defer cleanup()
dropConnections(db, "template1")
dropConnections(db, "testdb")
if _, err := db.Exec(`drop database if exists testdb; create database testdb`); err != nil {
panic(err)
}
return connect("testdb")
}
func initDatabase() {
db, cleanup := connect("template1")
defer cleanup()
for _, err := db.Exec(`select 1`); err != nil; {
time.Sleep(100 * time.Millisecond)
}
// Do test database migrations here
}
func connect(name string) (*sql.DB, func()) {
db, err := sql.Open("pgx", "postgresql://test:123@localhost:5432/"+name+"?sslmode=disable")
if err != nil {
panic(err)
}
return db, func() {
if err := db.Close(); err != nil {
panic(err)
}
}
}
func dropConnections(db *sql.DB, name string) {
query := `
select pg_terminate_backend(pg_stat_activity.pid)
from pg_stat_activity
where pg_stat_activity.datname = $1 and pid <> pg_backend_pid()`
_, err := db.Exec(query, name)
if err != nil {
panic(err)
}
}
So what's going on?
First, we do the initialization using a sync.Once
. You could have done this in TestMain
instead, but I like having it here in a single file, so it's easy to figure out what's going on. initDatabase
just connects to the template1
database and runs the migrations.
Next, we reconnect to the database, now to the default postgres
database. This is so we can drop the existing testdb
(from failed tests perhaps) and recreate it. We do that using drop database if exists testdb; create database testdb
. You could have written create database testdb template template1
with the exact same result, and this is where we use the templating trick. The template has all the migrations set up in initDatabase
, so now testdb
has them, too!
As you can see, then there's a small hack in there to close existing connections that might be hanging, because otherwise we can't create a new database from the template. I'm not sure if there's a better way to do this (there probably is), but this seems to be working for me.
Finally, we connect to testdb
and return the connection, along with a cleanup function that closes the connection.
The accompanying Docker compose configuration looks like this:
version: '3.8'
services:
db:
image: postgres:13
environment:
POSTGRES_USER: test
POSTGRES_PASSWORD: 123
POSTGRES_DB: template1
It's just a detail that I'm using Docker compose to spin up the test container. This would work equally well without Docker.
The results speak for themselves. Before:
$ go test ./storage
ok project/storage 33.304s
After:
go test ./storage
ok project/storage 9.880s
Overall, a 3x speedup, that made running our integration test suite much nicer again. An hour of investigation well spent. 😎
I’m Markus, a professional software consultant and developer. 🤓✨ You can reach me at [email protected].
I'm currently building Go courses over at golang.dk.