Operations¶
After initial Deployment and Running PipelineDB Cluster, there are useful operational processes that you may want to perform on your running cluster. Those processes are covered here.
Adding a Node to a Cluster¶
PipelineDB Cluster supports dynamically adding nodes to a running cluster with no downtime or loss of service. New nodes will dynamically join the cluster and inform the other nodes in the cluster of their existence as soon as they’re available to perform work.
PipelineDB Cluster includes the pipeline-cluster-join
tool to make this operational task as easy as possible. Its usage is as follows:
usage: pipeline-cluster-join [-h] -c CONFIG -D DATA_DIR [-U DBUSER]
[--encoding ENCODING] [--locale LOCALE] [-b HBA]
[-k]
node_id
pipeline-cluster-join joins a node with an existing PipelineDB Cluster
positional arguments:
node_id unique name@host[:port] identifier of the new node
optional arguments:
-h, --help show this help message and exit
-c CONFIG path to a local configuration file containing
configuration parameters that will be appended to each
cluster node's pipelinedb.conf
-D DATA_DIR path of the data directory to initialize
-U DBUSER database superuser name
--encoding ENCODING set default encoding for new databases
--locale LOCALE set default locale for new databases
-b HBA location of pg_hba.conf file to use for cluster
-k if set, keep all files generated during join
Report bugs to <eng@pipelinedb.com>.
Just as with Cluster Initialization, pipeline-cluster-join
takes a configuration file containing all parameters you’d like set for the new node. It also takes the pipeline_cluster.nodes
parameter to specify the existing nodes in the cluster, from which it will obtain all of the necessary metadata to join the running cluster.
pipeline-cluster-join
is meant to be run locally on the node joining the cluster. As an example, here’s what it looks like to join a new node to a running cluster of 3 nodes:
# Show contents of pipelinedb.conf
$ cat pipelinedb.conf
pipeline_cluster.nodes = 'node0@host0:5432,node1@host1:5432,node2@host2:5432'
pipeline_cluster.license_key = '9FBFA1E4CC63E4BAD6B72C9F9A215B65'
Once you’ve created your configuration file, just run pipeline-cluster-init
:
$ pipeline-cluster-join node3@host3:5432 -c pipelinedb.conf -D data_dir
detecting installation directory ... ok
validating configuration file ... ok
testing connection to node0 [host0:5432] ... ok
testing connection to node1 [host1:5442] ... ok
testing connection to node2 [host2:5442] ... ok
initdb ... ok
starting PipelineDB ... ok
creating pipeline_cluster extension ... ok
bootstrapping nodes metadata ... ok
replaying DDL log ... ok
shutting down PipelineDB ... ok
update postgresql.conf ... ok
writing node.conf ... ok
update pg_hba.conf ... ok
syncing node node0 [host0:5432] ... ok
syncing node node1 [host1:5442] ... ok
syncing node node1 [host2:5442] ... ok
cleaning up ... ok
Successfully joined cluster.
After this point, each node in the original 3-node cluster will be aware of the new node, and you’ll have a 4-node cluster. No server restarts or further operations are required.
It is important to note that the new node in the cluster will not have any work to do at this point. It can process SELECT
queries against existing sharded continuous views, but all shards will exist on the original 3 nodes. However, the new node will be assigned shards for newly created continuous views.
But to spread the existing workload onto the new node, you’ll want to scale up the number of shards in your continuous views, which will be described next.
Adding Shards to Continuous Views¶
The number of shards in a continuous view may be increased to at most the number of nodes in the cluster. Scaling up is done by modifying the sharded continuous view’s num_shards
parameter via ALTER
:
ALTER VIEW shared_cv SET (num_shards = 3);
After num_shards
is increased, all new shards that were created will begin serving reads and writes.
Important
num_shards
may only be increased, it cannot be decreased. Also, num_replicas
currently cannot be modified.
For read_optimized (default) shards, each shard for a continuous view owns a disjoint subset of all aggregate groups. As a result, when new shards are created the original shards will contain groups that are owned by the new shards. If you’d like to physically rebalance your shards such that the new shard’s groups are moved from existing shards, see Rebalancing Shards.
Note
Each shard’s groups do not have to be perfectly disjoint, and in most cases will not be! If a shard that a group belongs to is temporarily unreachable, PipelineDB Cluster will route writes for that group to another shard in order to maintain maximum availability. See Writes for more information.
Rebalancing Shards¶
To physically rebalance all shards in a read_optimized (default) continuous view such that each shard contains a disjoint subset of all groups in the continuous view, you may use the pipeline_cluster.rebalance_shards
utility function:
SELECT pipeline_cluster.rebalance_shards('sharded_cv');
This function will physically move all aggregate groups in each shard that belong to another shard, merging multiple instances of each group by Combining Shards in the final result.
Note that rebalancing is not applicable to write_optimized continuous views.
Important
For large continuous views, rebalancing shards can be an expensive operation, so use it with care. It requires exclusively locking each shard in the continuous view, which means that writes to these shards will be blocked for the duration of the operation. Reads will continue to work without interruption. Given that continuous views contain aggregate data and thus are relatively compact, it is usually perfectly fine to not manually rebalance them