PipelineDB Cluster is designed to be seamlessly added as an extension to existing PipelineDB core deployments. So first, ensure that you have PipelineDB installed on all host systems that will comprise your PipelineDB Cluster cluster.


If you’d like to learn more about PipelineDB Cluster’s clustering engine first, check out the Clustering section, although it isn’t necessary to do so before proceeding.

Installing the Extension

After installing PipelineDB and obtaining the PipelineDB Cluster binaries, use the appropriate package manager to install the extension on all nodes in your cluster:

rpm -ivh pipeline_cluster-<version>.rpm

# Debian
dpkg -i pipeline_cluster-<version>.deb

Finally, ensure the following on each node in your cluster:

  • PostgreSQL’s bin directory is on the $PATH
  • Clock is roughly in sync with all other servers in the cluster (ntpd works very well for this)

Cluster Initialization

PipelineDB Cluster ships with a tool, pipeline-cluster-init, that automatically bootstraps and configures all nodes in a cluster. It can be found at $PG_ROOT/bin/pipeline-cluster-init, and is the easiest way to get up and running with good default settings. Its usage is as follows:

usage: pipeline-cluster-init [-h] -c CONFIG -D DATA_DIR [-X XLOG_DIR]
                                [-U DBUSER] [--encoding ENCODING]
                                [--locale LOCALE] [-b HBA] [-S] [-u USERNAME]
                                [-i IDENTITY] [--bindir REMOTE_DIR]

pipeline-cluster-init initializes a PipelineDB Cluster database cluster.

optional arguments:
  -h, --help           show this help message and exit
  -c CONFIG            path to a local configuration file containing
                       configuration parameters that will be appended to each
                       cluster node's pipelinedb.conf
  -b HBA               location of pg_hba.conf file to use for cluster
  -S                   should ssh and initialize all nodes?

  -D DATA_DIR          path of the data directory to initialize
  -X XLOG_DIR          location for the transaction log directory
  -U DBUSER            database superuser name
  --encoding ENCODING  set default encoding for new databases
  --locale LOCALE      set default locale for new databases

  only valid with the -S flag

  -u USERNAME          ssh username
  -i IDENTITY          ssh identity file
  --bindir REMOTE_DIR  installation directory on the remote hosts

Report bugs to <>.

The easiest way to intialize a cluster is to use ssh via the -S flag so that the script only needs to be run a single time from one place. Without using the -S flag, pipeline-cluster-init will run in local mode and only initialize the node that it is run on. In fact, in ssh mode pipeline-cluster-init will run itself in local mode on each remote node.

As an example, to initialize a cluster of three nodes using ssh mode we begin with a configuration file containing all of our nodes as well as our license key:

# Show contents of pipelinedb.conf
$ cat pipelinedb.conf
pipeline_cluster.nodes = 'node0@host0:5432,node1@host1:5432,node2@host2:5432'
pipeline_cluster.license_key = '9FBFA1E4CC63E4BAD6B72C9F9A215B65'

Next, run pipeline-cluster-init from somewhere with ssh access to each node in your cluster:

# Run pipeline-cluster-init in ssh mode
$ pipeline-cluster-init -c pipelinedb.conf -D data_dir -S
validating configuration file ... ok
initializing node node0 [host0:5432] ... ok
initializing node node1 [host1:5432] ... ok
initializing node node2 [host2:5432] ... ok

pipeline-cluster-init is meant to be run from any machine with ssh access to all nodes in the cluster. Running it doesn’t require having the PipelineDB Cluster binaries installed, and it can be downloaded independently here.


The configuration file must have a pipeline_cluster.nodes parameter which should be a comma-separated list of name@host:port identifiers, for example,

pipeline_cluster.nodes does not have to represent a static set of nodes in your cluster. Adding a Node to a Cluster after initial configuration may be done with no downtime or interruption of service.


Each PipelineDB Cluster node’s pg_hba.conf must be configured to allow for incoming connections from all other nodes in the cluster. The simplest example of such a configuration is the following:

local   all       all              trust
host    all       all  trust

This pg_hba.conf allows connection requests from any client, which may be sufficient with firewall-level security. For more advanced configuration options, see PostgreSQL’s pg_hba.conf documentation.


The aforementioned example configuration is used as the default pg_hba.conf by the pipeline-cluster-init script.


By default, PipelineDB Cluster uses ports 5432, 6432 and 7432, so ensure that your firewall settings allow TCP communication over these ports.

Running PipelineDB Cluster

On each node, run PipelineDB under the data directory that was initialized using pipelinedb-cluster-init.

To verify that PipelineDB Cluster has been properly installed, create a simple sharded continuous view from any node:

psql -h <host> -d postgres -c "\
CREATE VIEW verify WITH (num_shards=2) AS SELECT x, COUNT(*) FROM stream GROUP BY x"

The continuous view should now be visible from any PipelineDB Cluster node.

psql -h <host> -d postgres -c "\d+ verify"

Client Endpoints

PipelineDB Cluster uses a shared-nothing architecture with no master node, which means that any client can connect to any node equivalently. However, for load balancing and simplicity for clients, we recommend putting a reverse proxy such as HAProxy in front of the PipelineDB Cluster cluster so that clients must only be aware of a single endpoint.

PipelineDB Cluster ships with a tool for automatically generating an HAProxy configuration to make deployment as easy as possible. Head over to the Load Balancing section for more information.