Operations

After initial Deployment and Running PipelineDB Cluster, there are useful operational processes that you may want to perform on your running cluster. Those processes are covered here.

Adding a Node to a Cluster

PipelineDB Cluster supports dynamically adding nodes to a running cluster with no downtime or loss of service. New nodes will dynamically join the cluster and inform the other nodes in the cluster of their existence as soon as they’re available to perform work.

PipelineDB Cluster includes the pipeline-cluster-join tool to make this operational task as easy as possible. Its usage is as follows:

usage: pipeline-cluster-join [-h] -c CONFIG -D DATA_DIR [-U DBUSER]
                            [--encoding ENCODING] [--locale LOCALE] [-b HBA]
                            [-k]
                            node_id

pipeline-cluster-join joins a node with an existing PipelineDB Cluster

positional arguments:
  node_id              unique name@host[:port] identifier of the new node

optional arguments:
  -h, --help           show this help message and exit
  -c CONFIG            path to a local configuration file containing
                       configuration parameters that will be appended to each
                       cluster node's pipelinedb.conf
  -D DATA_DIR          path of the data directory to initialize
  -U DBUSER            database superuser name
  --encoding ENCODING  set default encoding for new databases
  --locale LOCALE      set default locale for new databases
  -b HBA               location of pg_hba.conf file to use for cluster
  -k                   if set, keep all files generated during join

Report bugs to <eng@pipelinedb.com>.

Just as with Cluster Initialization, pipeline-cluster-join takes a configuration file containing all parameters you’d like set for the new node. It also takes the pipeline_cluster.nodes parameter to specify the existing nodes in the cluster, from which it will obtain all of the necessary metadata to join the running cluster.

pipeline-cluster-join is meant to be run locally on the node joining the cluster. As an example, here’s what it looks like to join a new node to a running cluster of 3 nodes:

# Show contents of pipelinedb.conf
$ cat pipelinedb.conf
pipeline_cluster.nodes = 'node0@host0:5432,node1@host1:5432,node2@host2:5432'
pipeline_cluster.license_key = '9FBFA1E4CC63E4BAD6B72C9F9A215B65'

Once you’ve created your configuration file, just run pipeline-cluster-init:

$ pipeline-cluster-join node3@host3:5432 -c pipelinedb.conf -D data_dir
detecting installation directory ... ok
validating configuration file ... ok
testing connection to node0 [host0:5432] ... ok
testing connection to node1 [host1:5442] ... ok
testing connection to node2 [host2:5442] ... ok
initdb ... ok
starting PipelineDB ... ok
creating pipeline_cluster extension ... ok
bootstrapping nodes metadata ... ok
replaying DDL log ... ok
shutting down PipelineDB ... ok
update postgresql.conf ... ok
writing node.conf ... ok
update pg_hba.conf ... ok
syncing node node0 [host0:5432] ... ok
syncing node node1 [host1:5442] ... ok
syncing node node1 [host2:5442] ... ok
cleaning up ... ok

Successfully joined cluster.

After this point, each node in the original 3-node cluster will be aware of the new node, and you’ll have a 4-node cluster. No server restarts or further operations are required.

It is important to note that the new node in the cluster will not have any work to do at this point. It can process SELECT queries against existing sharded continuous views, but all shards will exist on the original 3 nodes. However, the new node will be assigned shards for newly created continuous views.

But to spread the existing workload onto the new node, you’ll want to scale up the number of shards in your continuous views, which will be described next.

Adding Shards to Continuous Views

The number of shards in a continuous view may be increased to at most the number of nodes in the cluster. Scaling up is done by modifying the sharded continuous view’s num_shards parameter via ALTER:

ALTER VIEW shared_cv SET (num_shards = 3);

After num_shards is increased, all new shards that were created will begin serving reads and writes.

Important

num_shards may only be increased, it cannot be decreased. Also, num_replicas currently cannot be modified.

For read_optimized (default) shards, each shard for a continuous view owns a disjoint subset of all aggregate groups. As a result, when new shards are created the original shards will contain groups that are owned by the new shards. If you’d like to physically rebalance your shards such that the new shard’s groups are moved from existing shards, see Rebalancing Shards.

Note

Each shard’s groups do not have to be perfectly disjoint, and in most cases will not be! If a shard that a group belongs to is temporarily unreachable, PipelineDB Cluster will route writes for that group to another shard in order to maintain maximum availability. See Writes for more information.

Rebalancing Shards

To physically rebalance all shards in a read_optimized (default) continuous view such that each shard contains a disjoint subset of all groups in the continuous view, you may use the pipeline_cluster.rebalance_shards utility function:

SELECT pipeline_cluster.rebalance_shards('sharded_cv');

This function will physically move all aggregate groups in each shard that belong to another shard, merging multiple instances of each group by Combining Shards in the final result.

Note that rebalancing is not applicable to write_optimized continuous views.

Important

For large continuous views, rebalancing shards can be an expensive operation, so use it with care. It requires exclusively locking each shard in the continuous view, which means that writes to these shards will be blocked for the duration of the operation. Reads will continue to work without interruption. Given that continuous views contain aggregate data and thus are relatively compact, it is usually perfectly fine to not manually rebalance them