Deployment¶
PipelineDB Cluster is designed to be seamlessly added as an extension to existing PipelineDB core deployments. So first, ensure that you have PipelineDB installed on all host systems that will comprise your PipelineDB Cluster cluster.
Note
If you’d like to learn more about PipelineDB Cluster’s clustering engine first, check out the Clustering section, although it isn’t necessary to do so before proceeding.
Installing the Extension¶
After installing PipelineDB and obtaining the PipelineDB Cluster binaries, use the appropriate package manager to install the extension on all nodes in your cluster:
# RPM
rpm -ivh pipeline_cluster-<version>.rpm
# Debian
dpkg -i pipeline_cluster-<version>.deb
Finally, ensure the following on each node in your cluster:
- PostgreSQL’s
bin
directory is on the$PATH
- Clock is roughly in sync with all other servers in the cluster (
ntpd
works very well for this)
Cluster Initialization¶
PipelineDB Cluster ships with a tool, pipeline-cluster-init
, that automatically bootstraps and configures all nodes in a cluster. It can be found at $PG_ROOT/bin/pipeline-cluster-init
, and is the easiest way to get up and running with good default settings. Its usage is as follows:
usage: pipeline-cluster-init [-h] -c CONFIG -D DATA_DIR [-X XLOG_DIR]
[-U DBUSER] [--encoding ENCODING]
[--locale LOCALE] [-b HBA] [-S] [-u USERNAME]
[-i IDENTITY] [--bindir REMOTE_DIR]
pipeline-cluster-init initializes a PipelineDB Cluster database cluster.
optional arguments:
-h, --help show this help message and exit
-c CONFIG path to a local configuration file containing
configuration parameters that will be appended to each
cluster node's pipelinedb.conf
-b HBA location of pg_hba.conf file to use for cluster
-S should ssh and initialize all nodes?
initdb:
-D DATA_DIR path of the data directory to initialize
-X XLOG_DIR location for the transaction log directory
-U DBUSER database superuser name
--encoding ENCODING set default encoding for new databases
--locale LOCALE set default locale for new databases
ssh:
only valid with the -S flag
-u USERNAME ssh username
-i IDENTITY ssh identity file
--bindir REMOTE_DIR installation directory on the remote hosts
Report bugs to <eng@pipelinedb.com>.
The easiest way to intialize a cluster is to use ssh
via the -S
flag so that the script only needs to be run a single time from one place. Without using the -S
flag, pipeline-cluster-init
will run in local mode and only initialize the node that it is run on. In fact, in ssh
mode pipeline-cluster-init
will run itself in local mode on each remote node.
As an example, to initialize a cluster of three nodes using ssh
mode we begin with a configuration file containing all of our nodes as well as our license key:
# Show contents of pipelinedb.conf
$ cat pipelinedb.conf
pipeline_cluster.nodes = 'node0@host0:5432,node1@host1:5432,node2@host2:5432'
pipeline_cluster.license_key = '9FBFA1E4CC63E4BAD6B72C9F9A215B65'
Next, run pipeline-cluster-init
from somewhere with ssh
access to each node in your cluster:
# Run pipeline-cluster-init in ssh mode
$ pipeline-cluster-init -c pipelinedb.conf -D data_dir -S
validating configuration file ... ok
initializing node node0 [host0:5432] ... ok
initializing node node1 [host1:5432] ... ok
initializing node node2 [host2:5432] ... ok
pipeline-cluster-init
is meant to be run from any machine with ssh
access to all nodes in the cluster. Running it doesn’t require having the PipelineDB Cluster binaries installed, and it can be downloaded independently here.
Note
The configuration file must have a pipeline_cluster.nodes
parameter which should be a comma-separated list of name@host:port
identifiers, for example node0@dev1.pipelinedb.com:5432, node1@dev2.pipelinedb.com:5432
.
pipeline_cluster.nodes
does not have to represent a static set of nodes in your cluster. Adding a Node to a Cluster after initial configuration may be done with no downtime or interruption of service.
pg_hba.conf¶
Each PipelineDB Cluster node’s pg_hba.conf
must be configured to allow for incoming connections from all other nodes in the cluster. The simplest example of such a configuration is the following:
# TYPE DATABASE USER ADDRESS METHOD
local all all trust
host all all 0.0.0.0/0 trust
This pg_hba.conf
allows connection requests from any client, which may be sufficient with firewall-level security. For more advanced configuration options, see PostgreSQL’s pg_hba.conf documentation.
Note
The aforementioned example configuration is used as the default pg_hba.conf
by the pipeline-cluster-init
script.
Ports¶
By default, PipelineDB Cluster uses ports 5432, 6432 and 7432, so ensure that your firewall settings allow TCP communication over these ports.
Running PipelineDB Cluster¶
On each node, run PipelineDB under the data directory that was initialized using pipelinedb-cluster-init
.
To verify that PipelineDB Cluster has been properly installed, create a simple sharded continuous view from any node:
psql -h <host> -d postgres -c "\
CREATE VIEW verify WITH (num_shards=2) AS SELECT x, COUNT(*) FROM stream GROUP BY x"
The continuous view should now be visible from any PipelineDB Cluster node.
psql -h <host> -d postgres -c "\d+ verify"
Client Endpoints¶
PipelineDB Cluster uses a shared-nothing architecture with no master node, which means that any client can connect to any node equivalently. However, for load balancing and simplicity for clients, we recommend putting a reverse proxy such as HAProxy in front of the PipelineDB Cluster cluster so that clients must only be aware of a single endpoint.
PipelineDB Cluster ships with a tool for automatically generating an HAProxy configuration to make deployment as easy as possible. Head over to the Load Balancing section for more information.