Deployment

PipelineDB Cluster is designed to be seamlessly added as an extension to existing PipelineDB core deployments. So first, ensure that you have PipelineDB installed on all host systems that will comprise your PipelineDB Cluster cluster.

Note

If you’d like to learn more about PipelineDB Cluster’s clustering engine first, check out the Clustering section, although it isn’t necessary to do so before proceeding.

Installing the Extension

After installing PipelineDB and obtaining the PipelineDB Cluster binaries, use the appropriate package manager to install the extension on all nodes in your cluster:

# RPM
rpm -ivh pipelinedb_cluster-<version>.rpm

# Debian
dpkg -i pipelinedb_cluster-<version>.deb

Cluster Initialization

PipelineDB Cluster ships with a tool, pipeline-cluster-init, that automatically bootstraps and configures all nodes in a cluster. It can be found at $PIPELINEDB_ROOT/bin/pipeline-cluster-init, and is the easiest way to get up and running with good default settings. Its usage is as follows:

usage: pipeline-cluster-init [-h] -c CONFIG -D DATA_DIR [-X XLOG_DIR]
                                [-U DBUSER] [--encoding ENCODING]
                                [--locale LOCALE] [-b HBA] [-S] [-u USERNAME]
                                [-i IDENTITY] [--bindir REMOTE_DIR]

pipeline-cluster-init initializes a PipelineDB Cluster database cluster.

optional arguments:
  -h, --help           show this help message and exit
  -c CONFIG            path to a local configuration file containing
                       configuration parameters that will be appended to each
                       cluster node's pipelinedb.conf
  -b HBA               location of pg_hba.conf file to use for cluster
  -S                   should ssh and initialize all nodes?

initdb:
  -D DATA_DIR          path of the data directory to initialize
  -X XLOG_DIR          location for the transaction log directory
  -U DBUSER            database superuser name
  --encoding ENCODING  set default encoding for new databases
  --locale LOCALE      set default locale for new databases

ssh:
  only valid with the -S flag

  -u USERNAME          ssh username
  -i IDENTITY          ssh identity file
  --bindir REMOTE_DIR  installation directory on the remote hosts

Report bugs to <eng@pipelinedb.com>.

The easiest way to intialize a cluster is to use ssh via the -S flag so that the script only needs to be run a single time from one place. Without using the -S flag, pipeline-cluster-init will only initialize the node that it is run on.

pipeline-cluster-init is meant to be run from any machine with ssh access to all nodes in the cluster. Running it doesn’t require having the PipelineDB Cluster binaries installed, and it can be downloaded independently here.

Note

The configuration file must have a pipeline_cluster.nodes parameter which should be a comma-separated list of name@host:port identifiers, for example node0@dev1.pipelinedb.com:5432, node1@dev2.pipelinedb.com:5432.

pg_hba.conf

Each PipelineDB Cluster node’s pg_hba.conf must be configured to allow for incoming connections from all other nodes in the cluster. The simplest example of such a configuration is the following:

# TYPE  DATABASE  USER  ADDRESS    METHOD
local   all       all              trust
host    all       all   0.0.0.0/0  trust

This pg_hba.conf allows connection requests from any client, which may be sufficient with firewall-level security. For more advanced configuration options, see PostgreSQL’s pg_hba.conf documentation.

Note

The aforementioned example configuration is used as the default pg_hba.conf by the pipeline-cluster-init script.

Ports

By default, PipelineDB Cluster uses ports 5432, 6432 and 7432, so ensure that your firewall settings allow TCP communication over these ports.

Running PipelineDB Cluster

On each node, run PipelineDB under the data directory that was initialized using pipelinedb-cluster-init.

To verify that PipelineDB Cluster has been properly installed, create a simple sharded continuous view from any node:

pipeline -h <host> -d <database> -c "\
CREATE CONTINUOUS VIEW verify WITH (num_shards=2) AS SELECT COUNT(*) FROM stream"

The continuous view should now be visible from any PipelineDB Cluster node.

pipeline -h <host> -d pipeline -c "\d+ verify"

Client Endpoints

PipelineDB Cluster uses a shared-nothing architecture with no master node, which means that any client can connect to any node equivalently. However, for load balancing and simplicity for clients, we recommend putting a reverse proxy such as HAProxy in front of the PipelineDB Cluster cluster so that clients must only be aware of a single endpoint.

PipelineDB Cluster ships with a tool for automatically generating an HAProxy configuration to make deployment as easy as possible. Head over to the Load Balancing section for more information.