Introduction
Germinator is a database seeding tool.
It reads YAML files that are templated using Handlebars.
It's well suited for:
- Development fixture data to emulate a "production-ish" environment
- Generating real-looking data using random generators (faker.js and chance.js)
- Canonical production data which is not changed by users
- One-off data dumps based on YAML/JSON structures
Germinator has a CLI, a Docker Image and a Node.js API. They all do essentially the same thing.
An example of a germinator seed file:
germinator: v2
# This flag tells germinator if it should UPDATE and DELETE entries
synchronize: true
# Optionally, seeds can respect NODE_ENV
$env: [dev, test, qa]
# Data defined here is passed as template data below the separator
data:
employees: 1000
positions:
- janitor
- chef
- server
---
# A list of all the database rows you want germinator to create
entities:
{{#each positions as |position|}}
- Position:
$id: 'position-{{position}}'
name: {{position}}
{{#repeat @root.employees}}
- Employee:
$id: '{tableName}-{{@index}}-{{position}}'
fullName: {{chance "name"}}
email: {{chance "email"}}
position: { $id: 'position-{{position}}' }
{{/repeat}}
{{/each}}
If we made employees: 10
, germinator would do the following:
insert into `position` (`name`) values ('janitor')
insert into `position` (`name`) values ('chef')
insert into `position` (`name`) values ('server')
insert into `employee` (`full_name`, `position`) values ('Seth Tran', 1)
insert into `employee` (`full_name`, `position`) values ('Isaiah Erickson', 1)
insert into `employee` (`full_name`, `position`) values ('Winifred Barnes', 1)
insert into `employee` (`full_name`, `position`) values ('Scott Collins', 1)
insert into `employee` (`full_name`, `position`) values ('Todd Houston', 1)
insert into `employee` (`full_name`, `position`) values ('Elijah Watson', 1)
insert into `employee` (`full_name`, `position`) values ('Isabelle Anderson', 1)
insert into `employee` (`full_name`, `position`) values ('Jeff Glover', 1)
insert into `employee` (`full_name`, `position`) values ('Ophelia Woods', 1)
insert into `employee` (`full_name`, `position`) values ('Patrick Wilkerson', 1)
insert into `employee` (`full_name`, `position`) values ('Troy Nichols', 2)
insert into `employee` (`full_name`, `position`) values ('Mayme Jones', 2)
insert into `employee` (`full_name`, `position`) values ('David Rice', 2)
insert into `employee` (`full_name`, `position`) values ('Beatrice Lawson', 2)
insert into `employee` (`full_name`, `position`) values ('Daniel Robertson', 2)
insert into `employee` (`full_name`, `position`) values ('Ruth McDonald', 2)
insert into `employee` (`full_name`, `position`) values ('Brian Conner', 2)
insert into `employee` (`full_name`, `position`) values ('Dora Lawrence', 2)
insert into `employee` (`full_name`, `position`) values ('Eugenia Rhodes', 2)
insert into `employee` (`full_name`, `position`) values ('Daniel Nunez', 2)
insert into `employee` (`full_name`, `position`) values ('Alberta Long', 3)
insert into `employee` (`full_name`, `position`) values ('Paul Miller', 3)
insert into `employee` (`full_name`, `position`) values ('Georgie Mathis', 3)
insert into `employee` (`full_name`, `position`) values ('Jean Elliott', 3)
insert into `employee` (`full_name`, `position`) values ('Michael Edwards', 3)
insert into `employee` (`full_name`, `position`) values ('Jay Gomez', 3)
insert into `employee` (`full_name`, `position`) values ('Clifford Cooper', 3)
insert into `employee` (`full_name`, `position`) values ('Lou Fitzgerald', 3)
insert into `employee` (`full_name`, `position`) values ('Samuel Owen', 3)
insert into `employee` (`full_name`, `position`) values ('Kyle Simmons', 3)
Principles
- Seed entries (which map to database rows) should be globally uniquely identifiable via
$id
- Seeds are a collection of these entries, divided into an unordered pool of files
- Insertion order is defined by foreign keys, and otherwise will be resolved optimally
- Seed entries can be marked as "synchronized", which allows them to be edited and deleted
- Seed files should be ergonomic and easy to read, with naming strategy support
Features
- Provides many template helpers to help you write fixtures
- Supports synchronized seeds - easy to add to your server startup
- Supports inter-entry references to auto-populate foreign keys
- Supports environment-specific seeds
- Supports custom
namingStrategy
andtables
mapping
Germinator is ORM and database agnostic. It's usable without Node.js, and is designed to be deployed using Docker.
It's a great solution for one-offs, or for long-lived canonical data.
Other noteable features:
- bcrypt password hashing
- moment.js date handling
- faker and chance libraries for realistic data fixtures
- custom primary key names (default is
id
), and composite IDs
Setup
Germinator only needs two things - a folder of YAML files, and database connection parameters.
It's normal to make a folder called seeds
, with different files for different categories of seeds.
seeds/
-> users.yml
-> posts.yml
-> categories.yml
Germinator also needs details to connect to your database. The CLI has options for these.
npx germinator -c postgres -u admin --pass s3cur3 --port 5432
Germinator will read environment variables as well:
GERMINATOR_CLIENT
/ --client
/ -c
: The type of database (postgres
, sqlite3
)
GERMINATOR_HOSTNAME
/ --hostname
/ -h
: The host of your database (default "localhost")
GERMINATOR_PORT
/ --port
/ -p
: Network port to access your database on
GERMINATOR_DATABASE
/ --database
/ -d
: Name of the database that germinator will operate in
GERMINATOR_USER
/ --user
/ -u
: User that germinator will connect as
GERMINATOR_PASS
/ --pass
: Password for the connecting user
GERMINATOR_FILENAME
/ --filename
/ -o
: SQLite database location
If you don't have Node.js, or want to isolate germinator, use the docker image.
Any important note!: Germinator will create 3 tables in your database! They
are prefixed with germinator_*
. These are used for tracking any seeds that
you've made, so that germinator can keep them up to date. The two other tables
are used for germinator's internal migrations.
If you really don't want this, use --noTracking
. This will make it impossible
for Germinator to synchronize values though.
Environment Specific Seeds
Germinator respects the NODE_ENV
environment variable. You can mark whole seed
files, or individual seeds, as environment-specific.
germinator: v2
synchronize: true
$env: ['development', 'qa']
entities: ...
In this example, germinator will only insert these seeds when NODE_ENV
is development
or qa
.
This can be done per-entry as well:
germinator: v2
synchronize: true
entities:
- TableA:
$id: table-a-1
$env: ['development', 'qa']
Naming Strategy
Germinator tries to use reasonable defaults, and assumes that you use SnakeCase
as a naming strategy for tables and columns.
You can opt-out of this, per-file or per-entry.
germinator: v2
synchronize: true
namingStrategy: AsIs
entities:
- odly_namedTable:
$id: table-a-1
- OtherTable:
$id: table-b-1
$namingStrategy: SnakeCase
Synchronization
In germinator, seed entries are explicitly "synchronized" or "non-synchronized". The simplest way to explain this term is the question "should germinator update this database row when it's re-run and field values have changed?".
The longer definition:
- Germinator will UPDATE the row when any field value resolves differently
- Germinator will DELETE the row if it's found to be missing in subsequent runs
This behavior is opt-in via the top-level synchronize
or per-entry $synchronize
.
germinator: v2
synchronize: true
entities:
- TableA:
$id: table-a-1
$synchronize: false
A note on --noTracking
You can opt-out of all synchronization via the --noTracking
flag in the CLI.
We don't really recommend this, but it's a supported use case. Note that even
if you don't need to synchronize values, it's still useful to track inserted values.
Otherwise, germinator has no choice but to re-insert the same seed every time
it's run.
Database Support
Officially, germinator is supported on Postgres and SQLite. Internally, we use Knex without many database-specific features, so in theory most clients are easy to support. Only SQLite and Postgres are run in CI, so they are the only clients that are allowed at the moment.
Relationships
Germinator tries to be smart about dependencies and references between entries.
Because of the enforced globally unique $id
fields, we can allow your seeds
to reference each other from within files.
A simple relationship looks like:
germinator: v2
synchronize: true
entities:
- Position:
$id: position-janitor
name: Janitor
- Employee:
$id: bob-joe
fullName: Bob Joe
positionId:
$id: position-janitor
This looks pretty innocent, but the magic happens in the $id: position-janitor
.
When germinator runs the seed, it knows a few things:
- the
employee.position_id
column is a foreign key referencingposition.id
- when creating
bob-joe
, we need to createposition-janitor
first, to populateposition_id
When you don't specify $idColumnName
, germinator assumes that your tables have an id
field.
We can be more explicit:
germinator: v2
synchronize: true
entities:
- Position:
$id: position-janitor
$idColumnName: guid
name: Janitor
- Employee:
$id: bob-joe
fullName: Bob Joe
positionGuid:
$id: position-janitor
$idColumn: guid
Composite IDs
Germinator has support for composite IDs. How you set this up is fairly straightforward.
germinator: v2
synchronize: true
entities:
- BlogPostCategories:
$id: post1-category1
$idColumnName: [post_id, category_id]
postId:
$id: post1
categoryId:
$id: category1
- ReferenceToCompositeTable:
$id: foobar
referencePostId:
$id: post1-category1
$idColumn: post_id
referenceCategoryId:
$id: post1-category1
$idColumn: category_id
This setup is manual, but mostly by design. We don't want to hide what's going on in the SQL layer.
Diamond Dependencies and Delete Order
Germinator can occasionally stumble when deleting many entries that rely on each other. Unfortunately, this is a limitation because of the way we store information about entered seeds.
Germinator will delete seeds in inverse-insertion order, so most of the time things work out. But you should be aware of this limitation in case you need to delete entries manually.
You're free to delete rows that germinator created, if you know that they will be removed next time germinator is run.
Templates
Germinator uses Handlebars. They render into YAML output, which then feeds Germinator's database entries. You could just use templates to generate plain objects, instead of database entries.
Handlebars Tips
Prefix your $ids logically, like 'qa-employee-1'. This allows adding other categorical entities easier (demo-employee-1).
Leverage the template system as much as you can, avoid repetition as much as you can. Driving your seeds this way makes it easy to scale up (go from 20 sample employees to 500).
Examples
An admin user for developers and QA deployments. Not synchronized because password
is not deterministic, and the user account should not change after first inserted.
germinator: v2
synchronize: false
$env: [dev, qa]
---
entities:
- User:
$id: admin-user
emailAddress: admin@example.com
password: {{password "testing"}}
Some random company entries in a CRM.
germinator: v2
synchronize: true
$env: [dev, qa]
---
entities:
{{#repeat 500}}
- Company:
$id: company-{{@index}}
name: {{{chance "company"}}}
phoneNumber: {{chance "phone"}}
emailAddress: {{chance "email"}}
addressId:
$id: company-{{@index}}-address
- Address:
$id: company-{{@index}}-address
city: {{chance "city"}}
streetAddress: {{chance "address"}}
postalCode: {{chance "postal"}}
{{/repeat}}
Using top section data
to feed bottom section template.
germinator: v2
synchronize: true
data:
calendar:
{{#repeat 20 as |n|}}
{{#with (multiply n 2) as |weeks|}}
- date: {{moment "2021-01-01" (momentAdd weeks "weeks")}}
{{/with}}
{{/repeat}}
---
entities:
{{#each @root/calendar as |event i|}}
- CalendarEvent:
$id: calendar-event-{{i}}
startDate: {{moment event.date (momentAdd 6 "hours")}}
endDate: {{moment event.date (momentAdd 14 "hours")}}
{{/each}}
Special Values
$idColumnName
A string
or string[]
that defines what the primary key columns are. When this
property is omitted, germinator will assume that id
is the one primary key.
namingStrategy and $namingStrategy
Can be AsIs
or SnakeCase
. This maps table and column names before performing SQL queries.
namingStrategy
is a top-level property. $namingStrategy
is an override per-entry.
schemaName and $schemaName
Defines what database schema to use when performing queries.
schemaName
is a top-level property. $schemaName
is an override per-entry.
synchronize and $synchronize
Defines whether to UPDATE or DELETE this entry in future runs of germinator.
synchronize
is a top-level property. $synchronize
is an override per-entry.
$env
Defines when this seed entry should be executed, depending on NODE_ENV
.
$env
is a top-level property, and can be overriden per-entry.
tableMapping
An object that defines a map of NickName
-> real_table_name
. Useful for legacy
DBs where a user-friendly name isn't possible.
tableMapping
is a top-level property.
Inline String Variables
String properties can utilize some specific variables, surrounded in {}
delimiters.
entities:
{{#repeat 1000}}
- TableA:
$id: '{tableName}-{{@index}}'
{{/repeat}}
Notice that {tableName}
is in single curlies - this substitution is done after
YAML is parsed. Specifically, tableName
is the normalized table name according
to the namingStrategy (in this case, table_a
).
Helpers
General
We include all of the handlebar-helpers. There are over 180 of them, so check them out!
We also include the repeat helper:
{{#repeat 1000 as |i|}}
$id: book-{{i}}
{{/repeat}}
{{#repeat start=17 count=2}}
$id: book-{{@index}}
{{/repeat}}
Chance
The {{chance}}
helper uses Chance.js for random data.
In general, any Chance.js function can be called like {{chance "fnName"}}
with
arguments proceeding the function name.
Name | Example |
---|---|
Boolean | {{chance "bool"}} |
Integer | {{chance "integer"}} |
{{chance "integer" min=0 max=10}} | |
Float | {{chance "float"}} |
Integer | {{chance "integer"}} |
Prime | {{chance "prime"}} |
Letter | {{chance "letter"}} |
Text | {{chance "string"}} |
Paragraph | {{chance "paragraph"}} |
Sentence | {{chance "sentence"}} |
Word | {{chance "word"}} |
Date | {{chance "date"}} |
First Name | {{chance "first"}} |
Last Name | {{chance "last"}} |
Full Name | {{chance "name"}} |
{{chance "email"}} | |
URL | {{chance "url"}} |
City | {{chance "city"}} |
Country | {{chance "country"}} |
Postal | {{chance "postal"}} |
Zip Code | {{chance "zip"}} |
GUID | {{chance "guid"}} |
All of the functions listed in Chance.js should be supported, only a subset are listed here. Arguments that are passed will be forwarded into Chance.js's generators.
Faker.js
The {{faker}}
helper uses Faker.js for random data.
In general, any Faker.js function can be called like {{faker "namespace.method"}}
with
arguments proceeding the function name.
Name | Example |
---|---|
Date | {{faker "date.past"}} |
Phone Number | {{faker "phone.phoneNumber"}} |
All of the functions listed in Faker.js should be supported, only a subset are listed here. Arguments that are passed will be forwarded into Faker.js's generators.
Moment
The {{moment}}
helper is useful for handling dates.
Name | Example |
---|---|
Formatting | {{moment "2019-01-01" format="MM-DD-YY"}} |
Parse as UTC | {{moment "2019-01-01" utc=true}} |
Mutations | {{moment "2019-01-01" "[add,5,days]"}} |
{{moment "2019-01-01" (momentAdd var "days")}} | |
{{moment "2019-01-01" (momentSubtract var "days")}} |
bcrypt
The {{password}}
helper renders password hashes using bcrypt.
Name | Example |
---|---|
Hashing | {{password "testing"}} |
{{password "testing" rounds=5}} | |
Insecure | {{password "testing" insecure=true}} |
The insecure option is as it sounds. It's a lot faster than random salts for every password you make though. This is useful for development environments with 1000s of users, where you want every germinator run to be a no-op (secure passwords by definition will hash differently every time).
Custom Helpers
The Node.js API has a helpers
argument in runSeeds
and basically any other
functions that use them. You can provide your own helpers on top of those, or
replace them entirely.
import { makeHelpers } from '@germinator/helpers';
import { runSeeds } from '@germinator/node';
await runSeeds({
helpers: {
...makeHelpers(),
myHelper() {
return 'this is custom!';
},
},
folder: ...,
db: { ... },
});
Command Line
-
Run through NPM:
npx germinator --help
-
Run through Docker:
docker run -it --rm ghcr.io/launchcodedev/germinator --help
It's normal to mount a folder for germinator to read from.
docker run -it --rm \ -v $(realpath seeds):/seeds \ ghcr.io/launchcodedev/germinator /seeds -c sqlite3 -o /seeds/db
Options:
-C, --cwd Runs germinator in a different directory [string]
-c, --client What kind of database to connect to ["postgres", "sqlite3"]
-h, --hostname Hostname of the database [default: "localhost"]
-p, --port Port of the database [number]
-d, --database Database name [string]
-o, --filename Filename for SQLite databases (:memory: will work) [string]
-u, --user Username to connect with [string]
--pass Password for the user [number]
--dryRun Does not run INSERT or UPDATE [boolean]
--noTracking Does not track inserted entries [boolean]
Dry Run Mode
Tries to do as much as possible, without INSERTs or UPDATEs. Useful for checking the effect of changes to seeds.
Run with --dryRun
. Will print SQL that germinator would have run, with a few exceptions.
No Tracking Mode
Run with --noTracking
, which will not track inserted entries. The advantage of
this is that germinator doesn't need to create tables in your database.
This mode should only be used for one-off seed insertions.
Docker
Germinator has a docker image, ghcr.io/launchcodedev/germinator
. It runs the CLI by
default as the entrypoint.
You should mount your seeds folder into the container, so it can read the YAML files within in.
See the CLI page for more options available.
Accessing Databases
Of course, Docker is isolated from your local environment. You'll likely need to
forward network ports or sockets as necessary. --net=host
might be the easiest way.
It's common to run germinator alongside your database in a docker-compose workspace. That way, hostnames are available to germinator.
Using in Kubernetes
Germinator would usually be run as a Job
Kubernetes resource.
apiVersion: batch/v1
kind: Job
metadata:
name: seeds
spec:
template:
spec:
containers:
- name: seeds
image: ghcr.io/launchcodedev/germinator
command: ["-c=postgres", "/seeds"]
env:
- name: GERMINATOR_HOST
value: db-host
- name: GERMINATOR_PORT
value: '5432'
- name: GERMINATOR_PASSWORD
valueFrom:
secretKeyRef:
name: secrets
key: dbPassword
# mount a folder of YAML files into /seeds
volumeMounts:
- name: seeds
readOnly: true
mountPath: "/seeds"
How you set up credentials to the database is entirely up to your setup.
Using in Docker Compose
Run Germinator beside your database instance in docker-compose.
services:
my-db:
image: postgres:12
environment:
- POSTGRES_USER=admin
- POSTGRES_PASSWORD=pwd
- POSTGRES_DB=my-db
ports:
- 5432:5432
seed:
image: ghcr.io/launchcodedev/germinator
command: /seeds
volumes:
- ./seeds:/seeds
environment:
- NODE_ENV=development
- GERMINATOR_CLIENT=postgres
- GERMINATOR_HOSTNAME=my-db
- GERMINATOR_PORT=5432
- GERMINATOR_DATABASE=my-db
- GERMINATOR_USER=admin
- GERMINATOR_PASS=pwd
links:
- my-db
Here, we have a folder in ./seeds
that's mounted into /seeds
.
Node.js API
Germinator is a Node.js module, so a programmatic API is available.
A subset of Germinator functionality is available in a browser-compatible format.
import { renderSeed } from '@germinator/core';
import { makeHelpers } from '@germinator/helpers';
const template = `
data:
books:
- Moby Dick
- The Great Gatsby
- To Kill a Mockingbird
libraries:
- Southern
- Northern
---
bookCollection:
{{#each @root.books as |book|}}
{{#each @root.libraries as |library|}}
- title: {{book}}
library: {{library}}
checkedOutBy: {{chance "name"}}
{{/each}}
{{/each}}
`;
const output = renderSeed(template, makeHelpers());
The output here will be an object that looks like:
{
bookCollection: [
{
title: 'Moby Dick',
library: 'Southern',
checkedOutBy: 'Seth Tran',
},
{
title: 'Moby Dick',
library: 'Northern',
checkedOutBy: 'Isaiah Erickson',
},
{
title: 'The Great Gatsby',
library: 'Southern',
checkedOutBy: 'Winifred Barnes',
},
{
title: 'The Great Gatsby',
library: 'Northern',
checkedOutBy: 'Scott Collins',
},
{
title: 'To Kill a Mockingbird',
library: 'Southern',
checkedOutBy: 'Todd Houston',
},
{
title: 'To Kill a Mockingbird',
library: 'Northern',
checkedOutBy: 'Elijah Watson',
},
];
}
Database Seeds
Of course, normal germinator functionality is available as well.
import { runSeeds } from '@germinator/node';
import { makeHelpers } from '@germinator/helpers';
await runSeeds(
{
helpers: makeHelpers(),
folder: `${__dirname}/seeds`,
db: {
client: 'postgres',
pool: { min: 1, max: 1 },
connection: {
host: 'localhost',
port: 5432,
user: 'admin',
password: 's3cur3',
},
},
},
{
// optional runtime properties
dryRun: false,
noTracking: false,
},
);
The API accepts a Knex
instance as db
, and an array of SeedFile
objects
instead of folder
if you want to manually construct them.