PipelineDB Cluster is designed to be seamlessly added as an extension to existing PipelineDB core deployments. So first, ensure that you have PipelineDB installed on all host systems that will comprise your PipelineDB Cluster cluster.
If you’d like to learn more about PipelineDB Cluster’s clustering engine first, check out the Clustering section, although it isn’t necessary to do so before proceeding.
Installing the Extension¶
After installing PipelineDB and obtaining the PipelineDB Cluster binaries, use the appropriate package manager to install the extension on all nodes in your cluster:
# RPM rpm -ivh pipeline_cluster-<version>.rpm # Debian dpkg -i pipeline_cluster-<version>.deb
Finally, ensure the following on each node in your cluster:
bindirectory is on the
- Clock is roughly in sync with all other servers in the cluster (
ntpdworks very well for this)
PipelineDB Cluster ships with a tool,
pipeline-cluster-init, that automatically bootstraps and configures all nodes in a cluster. It can be found at
$PG_ROOT/bin/pipeline-cluster-init, and is the easiest way to get up and running with good default settings. Its usage is as follows:
usage: pipeline-cluster-init [-h] -c CONFIG -D DATA_DIR [-X XLOG_DIR] [-U DBUSER] [--encoding ENCODING] [--locale LOCALE] [-b HBA] [-S] [-u USERNAME] [-i IDENTITY] [--bindir REMOTE_DIR] pipeline-cluster-init initializes a PipelineDB Cluster database cluster. optional arguments: -h, --help show this help message and exit -c CONFIG path to a local configuration file containing configuration parameters that will be appended to each cluster node's pipelinedb.conf -b HBA location of pg_hba.conf file to use for cluster -S should ssh and initialize all nodes? initdb: -D DATA_DIR path of the data directory to initialize -X XLOG_DIR location for the transaction log directory -U DBUSER database superuser name --encoding ENCODING set default encoding for new databases --locale LOCALE set default locale for new databases ssh: only valid with the -S flag -u USERNAME ssh username -i IDENTITY ssh identity file --bindir REMOTE_DIR installation directory on the remote hosts Report bugs to <email@example.com>.
The easiest way to intialize a cluster is to use
ssh via the
-S flag so that the script only needs to be run a single time from one place. Without using the
pipeline-cluster-init will run in local mode and only initialize the node that it is run on. In fact, in
pipeline-cluster-init will run itself in local mode on each remote node.
As an example, to initialize a cluster of three nodes using
ssh mode we begin with a configuration file containing all of our nodes as well as our license key:
# Show contents of pipelinedb.conf $ cat pipelinedb.conf pipeline_cluster.nodes = 'node0@host0:5432,node1@host1:5432,node2@host2:5432' pipeline_cluster.license_key = '9FBFA1E4CC63E4BAD6B72C9F9A215B65'
pipeline-cluster-init from somewhere with
ssh access to each node in your cluster:
# Run pipeline-cluster-init in ssh mode $ pipeline-cluster-init -c pipelinedb.conf -D data_dir -S validating configuration file ... ok initializing node node0 [host0:5432] ... ok initializing node node1 [host1:5432] ... ok initializing node node2 [host2:5432] ... ok
pipeline-cluster-init is meant to be run from any machine with
ssh access to all nodes in the cluster. Running it doesn’t require having the PipelineDB Cluster binaries installed, and it can be downloaded independently here.
The configuration file must have a
pipeline_cluster.nodes parameter which should be a comma-separated list of
name@host:port identifiers, for example
pipeline_cluster.nodes does not have to represent a static set of nodes in your cluster. Adding a Node to a Cluster after initial configuration may be done with no downtime or interruption of service.
Each PipelineDB Cluster node’s
pg_hba.conf must be configured to allow for incoming connections from all other nodes in the cluster. The simplest example of such a configuration is the following:
# TYPE DATABASE USER ADDRESS METHOD local all all trust host all all 0.0.0.0/0 trust
pg_hba.conf allows connection requests from any client, which may be sufficient with firewall-level security. For more advanced configuration options, see PostgreSQL’s pg_hba.conf documentation.
The aforementioned example configuration is used as the default
pg_hba.conf by the
By default, PipelineDB Cluster uses ports 5432, 6432 and 7432, so ensure that your firewall settings allow TCP communication over these ports.
Running PipelineDB Cluster¶
On each node, run PipelineDB under the data directory that was initialized using
To verify that PipelineDB Cluster has been properly installed, create a simple sharded continuous view from any node:
psql -h <host> -d postgres -c "\ CREATE VIEW verify WITH (num_shards=2) AS SELECT x, COUNT(*) FROM stream GROUP BY x"
The continuous view should now be visible from any PipelineDB Cluster node.
psql -h <host> -d postgres -c "\d+ verify"
PipelineDB Cluster uses a shared-nothing architecture with no master node, which means that any client can connect to any node equivalently. However, for load balancing and simplicity for clients, we recommend putting a reverse proxy such as HAProxy in front of the PipelineDB Cluster cluster so that clients must only be aware of a single endpoint.
PipelineDB Cluster ships with a tool for automatically generating an HAProxy configuration to make deployment as easy as possible. Head over to the Load Balancing section for more information.