This guide will walk through seeding a Postgres database. It will cover creating a Node script to seed the data and touch on some of the pros and cons as to the chosen approach.
The source code for this guide can be found here: github.com/molebox/seed-postgres-database-tutorial
This tutorial assumed that you have Postgres setup and know how to access its databases, though the guide will cover some basic commands towards the end. The guide uses the default user postgres for accessing the database. If you have another user account you would prefer to use then swap in that.
Helpful to know
The process of seeding (in the context of databases) is to insert or populate the initial data into the database. This can be either a manual or automated step in the setup of an application. Seeding can be used when testing different branches, for example, if you have a dev branch where you want to test some new sorting query against the database, seeding would be a good way to test against data that won't affect a production build. Of course, there are many reason one might choose to seed a database. In some instances, an applications database requires some form of data present before it will work properly, such as an admin account. But more often than not seeding would take place pre-install and thus allow the user to begin using the app without any issues.
The seed script will aim to accomplish the following:
Begin by creating a
schema.sql file at the root of your project. This file will enable you to lay the groundwork for how your database and its table will look.
To interact with the Postgres database, you can install the node-postgres package, a collection of modules made for interacting with Postgres. You'll use it to establish an initial connection to the database and insert some fake data. Create a new file
src/db.js and add the following:
The Pool class takes some optional config and the values passed in enable a connection with the database. They are set as environment variables (env vars) and imported from a separate config file. This file exports two functions. The query, which will be used to run an
INSERT statement, and a connect function which will be used to connect to the database.
Storing all the env vars in one place and exporting them means you have one source of truth and can easily swap them out from one place instead of multiple files. Create a new file and name it
An example of how your env vars might look:
In a real-world scenario, you would perhaps have some data stored in a csv file. This example will make use of the faker library, and some other packages. Install the following:
Use the Faker library by creating a function that will mimic the shape of the table set in
schema.sql. It will return a template literal string to be added to a csv file later on.
Next, you will need to import fs and create a stream. This will write to an as yet non-existent csv file.
Enabling the user to choose how many rows they would like to seed the database with is an extra and worthwhile step. The
minimist package helps with parsing argument options. In the case of the script, it allows the user the option to pass in an amount, if the user chooses not to pass any additional arguments then you can set a default value. Create a new function that will write the fake data to the csv file.
Now that the csv file is created and populated with fake data, you can begin the process of actually seeding that data into the Postgres database.
fast-csv is a library for parsing and formatting csv files. You'll use it in combination with the
validator library and
The function first validates the contents of the row using the contains function from the
validator library. This is necessary because some countries can have an extra comma in their name. An extra comma in a csv file equates to an extra column and the table created and defined in the
schema.sql file dictates that only 3 columns will exist. If this check fails
fast-csvwill not accept the row and throw an event, which is used to print a message to the console to inform the user.
If the row is accepted then it is added to an array. Once the parsing is finished and all the row data is added to the array the connection is established with the Postgres database. The data array is then iterated over, for each row in the array a client instance is acquired from the pool and an
INSERT query is used as an argument, along with the row data. If the row is successfully inserted into the table then its corresponding data is printed to the console, if any errors occur they are also printed to the console. Finally, the done function is called to release the clients back to the pool.
The final function called seed is where the data is written to the csv file, a stream is created to read the data from the output file and then the
INSERT function is piped to the stream.
Add two scripts to the
package.json file. The first
create-db will ask the user to login and connect to their Postgres database and then run the commands in the
schema.sql file. The second script will run the first before running the seed function.
The user may run the script with additional arguments to set the number of rows created in the table.
To check the database table you can run the following commands from your terminal:
There are many ways this could have been accomplished, in fact, there are many libraries that support using Node with Postgres. This method was chosen for its relative simplicity. It's not a generic solution that would fit all scenarios but it could be built upon to incorporate extra features.