Introduction

Germinator is a database seeding tool.

It reads YAML files that are templated using Handlebars.

It's well suited for:

  • Development fixture data to emulate a "production-ish" environment
  • Generating real-looking data using random generators (faker.js and chance.js)
  • Canonical production data which is not changed by users
  • One-off data dumps based on YAML/JSON structures

Germinator has a CLI, a Docker Image and a Node.js API. They all do essentially the same thing.

An example of a germinator seed file:

germinator: v2

# This flag tells germinator if it should UPDATE and DELETE entries
synchronize: true

# Optionally, seeds can respect NODE_ENV
$env: [dev, test, qa]

# Data defined here is passed as template data below the separator
data:
  employees: 1000
  positions:
    - janitor
    - chef
    - server

---

# A list of all the database rows you want germinator to create
entities:
  {{#each positions as |position|}}
  - Position:
      $id: 'position-{{position}}'
      name: {{position}}

  {{#repeat @root.employees}}
  - Employee:
      $id: '{tableName}-{{@index}}-{{position}}'
      fullName: {{chance "name"}}
      email: {{chance "email"}}
      position: { $id: 'position-{{position}}' }
  {{/repeat}}
  {{/each}}

If we made employees: 10, germinator would do the following:

insert into `position` (`name`) values ('janitor')
insert into `position` (`name`) values ('chef')
insert into `position` (`name`) values ('server')

insert into `employee` (`full_name`, `position`) values ('Seth Tran', 1)
insert into `employee` (`full_name`, `position`) values ('Isaiah Erickson', 1)
insert into `employee` (`full_name`, `position`) values ('Winifred Barnes', 1)
insert into `employee` (`full_name`, `position`) values ('Scott Collins', 1)
insert into `employee` (`full_name`, `position`) values ('Todd Houston', 1)
insert into `employee` (`full_name`, `position`) values ('Elijah Watson', 1)
insert into `employee` (`full_name`, `position`) values ('Isabelle Anderson', 1)
insert into `employee` (`full_name`, `position`) values ('Jeff Glover', 1)
insert into `employee` (`full_name`, `position`) values ('Ophelia Woods', 1)
insert into `employee` (`full_name`, `position`) values ('Patrick Wilkerson', 1)

insert into `employee` (`full_name`, `position`) values ('Troy Nichols', 2)
insert into `employee` (`full_name`, `position`) values ('Mayme Jones', 2)
insert into `employee` (`full_name`, `position`) values ('David Rice', 2)
insert into `employee` (`full_name`, `position`) values ('Beatrice Lawson', 2)
insert into `employee` (`full_name`, `position`) values ('Daniel Robertson', 2)
insert into `employee` (`full_name`, `position`) values ('Ruth McDonald', 2)
insert into `employee` (`full_name`, `position`) values ('Brian Conner', 2)
insert into `employee` (`full_name`, `position`) values ('Dora Lawrence', 2)
insert into `employee` (`full_name`, `position`) values ('Eugenia Rhodes', 2)
insert into `employee` (`full_name`, `position`) values ('Daniel Nunez', 2)

insert into `employee` (`full_name`, `position`) values ('Alberta Long', 3)
insert into `employee` (`full_name`, `position`) values ('Paul Miller', 3)
insert into `employee` (`full_name`, `position`) values ('Georgie Mathis', 3)
insert into `employee` (`full_name`, `position`) values ('Jean Elliott', 3)
insert into `employee` (`full_name`, `position`) values ('Michael Edwards', 3)
insert into `employee` (`full_name`, `position`) values ('Jay Gomez', 3)
insert into `employee` (`full_name`, `position`) values ('Clifford Cooper', 3)
insert into `employee` (`full_name`, `position`) values ('Lou Fitzgerald', 3)
insert into `employee` (`full_name`, `position`) values ('Samuel Owen', 3)
insert into `employee` (`full_name`, `position`) values ('Kyle Simmons', 3)

Principles

  1. Seed entries (which map to database rows) should be globally uniquely identifiable via $id
  2. Seeds are a collection of these entries, divided into an unordered pool of files
  3. Insertion order is defined by foreign keys, and otherwise will be resolved optimally
  4. Seed entries can be marked as "synchronized", which allows them to be edited and deleted
  5. Seed files should be ergonomic and easy to read, with naming strategy support

Features

  • Provides many template helpers to help you write fixtures
  • Supports synchronized seeds - easy to add to your server startup
  • Supports inter-entry references to auto-populate foreign keys
  • Supports environment-specific seeds
  • Supports custom namingStrategy and tables mapping

Germinator is ORM and database agnostic. It's usable without Node.js, and is designed to be deployed using Docker.

It's a great solution for one-offs, or for long-lived canonical data.

Other noteable features:

  • bcrypt password hashing
  • moment.js date handling
  • faker and chance libraries for realistic data fixtures
  • custom primary key names (default is id), and composite IDs

Setup

Germinator only needs two things - a folder of YAML files, and database connection parameters.

It's normal to make a folder called seeds, with different files for different categories of seeds.

seeds/
  -> users.yml
  -> posts.yml
  -> categories.yml

Germinator also needs details to connect to your database. The CLI has options for these.

npx germinator -c postgres -u admin --pass s3cur3 --port 5432

Germinator will read environment variables as well:

GERMINATOR_CLIENT / --client / -c: The type of database (postgres, sqlite3)

GERMINATOR_HOSTNAME / --hostname / -h: The host of your database (default "localhost")

GERMINATOR_PORT / --port / -p: Network port to access your database on

GERMINATOR_DATABASE / --database / -d: Name of the database that germinator will operate in

GERMINATOR_USER / --user / -u: User that germinator will connect as

GERMINATOR_PASS / --pass: Password for the connecting user

GERMINATOR_FILENAME / --filename / -o: SQLite database location

If you don't have Node.js, or want to isolate germinator, use the docker image.


Any important note!: Germinator will create 3 tables in your database! They are prefixed with germinator_*. These are used for tracking any seeds that you've made, so that germinator can keep them up to date. The two other tables are used for germinator's internal migrations.

If you really don't want this, use --noTracking. This will make it impossible for Germinator to synchronize values though.

Environment Specific Seeds

Germinator respects the NODE_ENV environment variable. You can mark whole seed files, or individual seeds, as environment-specific.

germinator: v2
synchronize: true
$env: ['development', 'qa']

entities: ...

In this example, germinator will only insert these seeds when NODE_ENV is development or qa.

This can be done per-entry as well:

germinator: v2
synchronize: true

entities:
  - TableA:
      $id: table-a-1
      $env: ['development', 'qa']

Naming Strategy

Germinator tries to use reasonable defaults, and assumes that you use SnakeCase as a naming strategy for tables and columns.

You can opt-out of this, per-file or per-entry.

germinator: v2
synchronize: true
namingStrategy: AsIs

entities:
  - odly_namedTable:
      $id: table-a-1
  - OtherTable:
      $id: table-b-1
      $namingStrategy: SnakeCase

Synchronization

In germinator, seed entries are explicitly "synchronized" or "non-synchronized". The simplest way to explain this term is the question "should germinator update this database row when it's re-run and field values have changed?".

The longer definition:

  • Germinator will UPDATE the row when any field value resolves differently
  • Germinator will DELETE the row if it's found to be missing in subsequent runs

This behavior is opt-in via the top-level synchronize or per-entry $synchronize.

germinator: v2
synchronize: true

entities:
  - TableA:
      $id: table-a-1
      $synchronize: false

A note on --noTracking

You can opt-out of all synchronization via the --noTracking flag in the CLI. We don't really recommend this, but it's a supported use case. Note that even if you don't need to synchronize values, it's still useful to track inserted values. Otherwise, germinator has no choice but to re-insert the same seed every time it's run.

Database Support

Officially, germinator is supported on Postgres and SQLite. Internally, we use Knex without many database-specific features, so in theory most clients are easy to support. Only SQLite and Postgres are run in CI, so they are the only clients that are allowed at the moment.

Relationships

Germinator tries to be smart about dependencies and references between entries. Because of the enforced globally unique $id fields, we can allow your seeds to reference each other from within files.

A simple relationship looks like:

germinator: v2
synchronize: true

entities:
  - Position:
      $id: position-janitor
      name: Janitor

  - Employee:
      $id: bob-joe
      fullName: Bob Joe
      positionId:
        $id: position-janitor

This looks pretty innocent, but the magic happens in the $id: position-janitor.

When germinator runs the seed, it knows a few things:

  1. the employee.position_id column is a foreign key referencing position.id
  2. when creating bob-joe, we need to create position-janitor first, to populate position_id

When you don't specify $idColumnName, germinator assumes that your tables have an id field.

We can be more explicit:

germinator: v2
synchronize: true

entities:
  - Position:
      $id: position-janitor
      $idColumnName: guid
      name: Janitor

  - Employee:
      $id: bob-joe
      fullName: Bob Joe
      positionGuid:
        $id: position-janitor
        $idColumn: guid

Composite IDs

Germinator has support for composite IDs. How you set this up is fairly straightforward.

germinator: v2
synchronize: true

entities:
  - BlogPostCategories:
      $id: post1-category1
      $idColumnName: [post_id, category_id]
      postId:
        $id: post1
      categoryId:
        $id: category1

  - ReferenceToCompositeTable:
      $id: foobar
      referencePostId:
        $id: post1-category1
        $idColumn: post_id
      referenceCategoryId:
        $id: post1-category1
        $idColumn: category_id

This setup is manual, but mostly by design. We don't want to hide what's going on in the SQL layer.

Diamond Dependencies and Delete Order

Germinator can occasionally stumble when deleting many entries that rely on each other. Unfortunately, this is a limitation because of the way we store information about entered seeds.

Germinator will delete seeds in inverse-insertion order, so most of the time things work out. But you should be aware of this limitation in case you need to delete entries manually.

You're free to delete rows that germinator created, if you know that they will be removed next time germinator is run.

Templates

Germinator uses Handlebars. They render into YAML output, which then feeds Germinator's database entries. You could just use templates to generate plain objects, instead of database entries.

Handlebars Tips

Prefix your $ids logically, like 'qa-employee-1'. This allows adding other categorical entities easier (demo-employee-1).

Leverage the template system as much as you can, avoid repetition as much as you can. Driving your seeds this way makes it easy to scale up (go from 20 sample employees to 500).

Examples

An admin user for developers and QA deployments. Not synchronized because password is not deterministic, and the user account should not change after first inserted.

germinator: v2
synchronize: false
$env: [dev, qa]

---

entities:
  - User:
      $id: admin-user
      emailAddress: admin@example.com
      password: {{password "testing"}}

Some random company entries in a CRM.

germinator: v2
synchronize: true
$env: [dev, qa]

---

entities:
  {{#repeat 500}}
  - Company:
      $id: company-{{@index}}
      name: {{{chance "company"}}}
      phoneNumber: {{chance "phone"}}
      emailAddress: {{chance "email"}}
      addressId:
        $id: company-{{@index}}-address

  - Address:
      $id: company-{{@index}}-address
      city: {{chance "city"}}
      streetAddress: {{chance "address"}}
      postalCode: {{chance "postal"}}
  {{/repeat}}

Using top section data to feed bottom section template.

germinator: v2
synchronize: true

data:
  calendar:
    {{#repeat 20 as |n|}}
    {{#with (multiply n 2) as |weeks|}}
    - date: {{moment "2021-01-01" (momentAdd weeks "weeks")}}
    {{/with}}
    {{/repeat}}

---

entities:
  {{#each @root/calendar as |event i|}}
  - CalendarEvent:
      $id: calendar-event-{{i}}
      startDate: {{moment event.date (momentAdd 6 "hours")}}
      endDate: {{moment event.date (momentAdd 14 "hours")}}
  {{/each}}

Special Values

$idColumnName

A string or string[] that defines what the primary key columns are. When this property is omitted, germinator will assume that id is the one primary key.

namingStrategy and $namingStrategy

Can be AsIs or SnakeCase. This maps table and column names before performing SQL queries.

namingStrategy is a top-level property. $namingStrategy is an override per-entry.

schemaName and $schemaName

Defines what database schema to use when performing queries.

schemaName is a top-level property. $schemaName is an override per-entry.

synchronize and $synchronize

Defines whether to UPDATE or DELETE this entry in future runs of germinator.

synchronize is a top-level property. $synchronize is an override per-entry.

$env

Defines when this seed entry should be executed, depending on NODE_ENV.

$env is a top-level property, and can be overriden per-entry.

tableMapping

An object that defines a map of NickName -> real_table_name. Useful for legacy DBs where a user-friendly name isn't possible.

tableMapping is a top-level property.

Inline String Variables

String properties can utilize some specific variables, surrounded in {} delimiters.

entities:
{{#repeat 1000}}
  - TableA:
      $id: '{tableName}-{{@index}}'
{{/repeat}}

Notice that {tableName} is in single curlies - this substitution is done after YAML is parsed. Specifically, tableName is the normalized table name according to the namingStrategy (in this case, table_a).

Helpers

General

We include all of the handlebar-helpers. There are over 180 of them, so check them out!

We also include the repeat helper:

{{#repeat 1000 as |i|}}
  $id: book-{{i}}
{{/repeat}}

{{#repeat start=17 count=2}}
  $id: book-{{@index}}
{{/repeat}}

Chance

The {{chance}} helper uses Chance.js for random data.

In general, any Chance.js function can be called like {{chance "fnName"}} with arguments proceeding the function name.

NameExample
Boolean{{chance "bool"}}
Integer{{chance "integer"}}
{{chance "integer" min=0 max=10}}
Float{{chance "float"}}
Integer{{chance "integer"}}
Prime{{chance "prime"}}
Letter{{chance "letter"}}
Text{{chance "string"}}
Paragraph{{chance "paragraph"}}
Sentence{{chance "sentence"}}
Word{{chance "word"}}
Date{{chance "date"}}
First Name{{chance "first"}}
Last Name{{chance "last"}}
Full Name{{chance "name"}}
Email{{chance "email"}}
URL{{chance "url"}}
City{{chance "city"}}
Country{{chance "country"}}
Postal{{chance "postal"}}
Zip Code{{chance "zip"}}
GUID{{chance "guid"}}

All of the functions listed in Chance.js should be supported, only a subset are listed here. Arguments that are passed will be forwarded into Chance.js's generators.

Faker.js

The {{faker}} helper uses Faker.js for random data.

In general, any Faker.js function can be called like {{faker "namespace.method"}} with arguments proceeding the function name.

NameExample
Date{{faker "date.past"}}
Phone Number{{faker "phone.phoneNumber"}}

All of the functions listed in Faker.js should be supported, only a subset are listed here. Arguments that are passed will be forwarded into Faker.js's generators.

Moment

The {{moment}} helper is useful for handling dates.

NameExample
Formatting{{moment "2019-01-01" format="MM-DD-YY"}}
Parse as UTC{{moment "2019-01-01" utc=true}}
Mutations{{moment "2019-01-01" "[add,5,days]"}}
{{moment "2019-01-01" (momentAdd var "days")}}
{{moment "2019-01-01" (momentSubtract var "days")}}

bcrypt

The {{password}} helper renders password hashes using bcrypt.

NameExample
Hashing{{password "testing"}}
{{password "testing" rounds=5}}
Insecure{{password "testing" insecure=true}}

The insecure option is as it sounds. It's a lot faster than random salts for every password you make though. This is useful for development environments with 1000s of users, where you want every germinator run to be a no-op (secure passwords by definition will hash differently every time).

Custom Helpers

The Node.js API has a helpers argument in runSeeds and basically any other functions that use them. You can provide your own helpers on top of those, or replace them entirely.

import { makeHelpers } from '@germinator/helpers';
import { runSeeds } from '@germinator/node';

await runSeeds({
  helpers: {
    ...makeHelpers(),
    myHelper() {
      return 'this is custom!';
    },
  },
  folder: ...,
  db: { ... },
});

Command Line

  1. Run through NPM:

    npx germinator --help
    
  2. Run through Docker:

    docker run -it --rm ghcr.io/launchcodedev/germinator --help
    

    It's normal to mount a folder for germinator to read from.

    docker run -it --rm \
      -v $(realpath seeds):/seeds \
      ghcr.io/launchcodedev/germinator /seeds -c sqlite3 -o /seeds/db
    

Options:

  -C, --cwd         Runs germinator in a different directory               [string]
  -c, --client      What kind of database to connect to     ["postgres", "sqlite3"]
  -h, --hostname    Hostname of the database                 [default: "localhost"]
  -p, --port        Port of the database                                   [number]
  -d, --database    Database name                                          [string]
  -o, --filename    Filename for SQLite databases (:memory: will work)     [string]
  -u, --user        Username to connect with                               [string]
      --pass        Password for the user                                  [number]
      --dryRun      Does not run INSERT or UPDATE                         [boolean]
      --noTracking  Does not track inserted entries                       [boolean]

Dry Run Mode

Tries to do as much as possible, without INSERTs or UPDATEs. Useful for checking the effect of changes to seeds.

Run with --dryRun. Will print SQL that germinator would have run, with a few exceptions.

No Tracking Mode

Run with --noTracking, which will not track inserted entries. The advantage of this is that germinator doesn't need to create tables in your database.

This mode should only be used for one-off seed insertions.

Docker

Germinator has a docker image, ghcr.io/launchcodedev/germinator. It runs the CLI by default as the entrypoint.

You should mount your seeds folder into the container, so it can read the YAML files within in.

See the CLI page for more options available.

Accessing Databases

Of course, Docker is isolated from your local environment. You'll likely need to forward network ports or sockets as necessary. --net=host might be the easiest way.

It's common to run germinator alongside your database in a docker-compose workspace. That way, hostnames are available to germinator.

Using in Kubernetes

Germinator would usually be run as a Job Kubernetes resource.

apiVersion: batch/v1
kind: Job

metadata:
  name: seeds

spec:
  template:
    spec:
      containers:
      - name: seeds
        image: ghcr.io/launchcodedev/germinator
        command: ["-c=postgres", "/seeds"]

        env:
          - name: GERMINATOR_HOST
            value: db-host
          - name: GERMINATOR_PORT
            value: '5432'
          - name: GERMINATOR_PASSWORD
            valueFrom:
              secretKeyRef:
                name: secrets
                key: dbPassword

        # mount a folder of YAML files into /seeds
        volumeMounts:
          - name: seeds
            readOnly: true
            mountPath: "/seeds"

How you set up credentials to the database is entirely up to your setup.

Using in Docker Compose

Run Germinator beside your database instance in docker-compose.

services:
  my-db:
    image: postgres:12
    environment:
      - POSTGRES_USER=admin
      - POSTGRES_PASSWORD=pwd
      - POSTGRES_DB=my-db
    ports:
      - 5432:5432

  seed:
    image: ghcr.io/launchcodedev/germinator
    command: /seeds
    volumes:
      - ./seeds:/seeds
    environment:
      - NODE_ENV=development
      - GERMINATOR_CLIENT=postgres
      - GERMINATOR_HOSTNAME=my-db
      - GERMINATOR_PORT=5432
      - GERMINATOR_DATABASE=my-db
      - GERMINATOR_USER=admin
      - GERMINATOR_PASS=pwd
    links:
      - my-db

Here, we have a folder in ./seeds that's mounted into /seeds.

Node.js API

Germinator is a Node.js module, so a programmatic API is available.

A subset of Germinator functionality is available in a browser-compatible format.

import { renderSeed } from '@germinator/core';
import { makeHelpers } from '@germinator/helpers';

const template = `
data:
  books:
    - Moby Dick
    - The Great Gatsby
    - To Kill a Mockingbird
  libraries:
    - Southern
    - Northern

---

bookCollection:
{{#each @root.books as |book|}}
{{#each @root.libraries as |library|}}
  - title: {{book}}
    library: {{library}}
    checkedOutBy: {{chance "name"}}
{{/each}}
{{/each}}
`;

const output = renderSeed(template, makeHelpers());

The output here will be an object that looks like:

{
  bookCollection: [
    {
      title: 'Moby Dick',
      library: 'Southern',
      checkedOutBy: 'Seth Tran',
    },
    {
      title: 'Moby Dick',
      library: 'Northern',
      checkedOutBy: 'Isaiah Erickson',
    },
    {
      title: 'The Great Gatsby',
      library: 'Southern',
      checkedOutBy: 'Winifred Barnes',
    },
    {
      title: 'The Great Gatsby',
      library: 'Northern',
      checkedOutBy: 'Scott Collins',
    },
    {
      title: 'To Kill a Mockingbird',
      library: 'Southern',
      checkedOutBy: 'Todd Houston',
    },
    {
      title: 'To Kill a Mockingbird',
      library: 'Northern',
      checkedOutBy: 'Elijah Watson',
    },
  ];
}

Database Seeds

Of course, normal germinator functionality is available as well.

import { runSeeds } from '@germinator/node';
import { makeHelpers } from '@germinator/helpers';

await runSeeds(
  {
    helpers: makeHelpers(),
    folder: `${__dirname}/seeds`,
    db: {
      client: 'postgres',
      pool: { min: 1, max: 1 },
      connection: {
        host: 'localhost',
        port: 5432,
        user: 'admin',
        password: 's3cur3',
      },
    },
  },
  {
    // optional runtime properties
    dryRun: false,
    noTracking: false,
  },
);

The API accepts a Knex instance as db, and an array of SeedFile objects instead of folder if you want to manually construct them.