Two Phase Commits

PipelineDB Cluster is designed to be a high-throughput stream processing engine, not a distributed relational database. However, there still needs to be a way to execute DDL statements on all nodes without having to manually connect to each node. We also support stream-table joins so all nodes must always have the same view for each table. To achieve this, we use the two-phase commit protocol for all DDL and non-stream DML statements.

All DDL/non-stream DML statements are executed in a distributed transaction on all nodes in the cluster and committed via two-phase commit. A BEGIN statement will start both a local and a distributed transaction. Statements that don’t need two-phase commits will be executed in the local transaction while the ones which do will be executed in the distributed transaction. A COMMIT/ROLLBACK will end both transactions.

Pitfalls

Transaction blocks that have mixed statements, i.e. statements executed locally and statements executed via two-phase commit result in two different transactions. The distributed transaction is committed before the local one, so there is a chance that the distributed transaction commits but the local one fails (in case the node crashes in between the two commits).

The node on which statements are being executed is the coordinator node for the two-phase commit transaction, so if it fails you might have to manually rollback prepared transactions on all nodes. All transactions prepared by PipelineDB Cluster can be listed by running:

SELECT * FROM pg_prepared_xacts WHERE gid LIKE 'pipeline_cluster.2pc-%';

If a two-phase commit cohort fails after it has acknowledged a PREPARE TRANSACTION, the coordinator will emit a warning telling you to manually commit or rollback the transaction. This warning must be taken seriously, otherwise you will have a long running transaction which can choke the auto-vacuumer. The warning will look something like:

WARNING:  failed to COMMIT two-phase commit transaction 'pipeline_cluster.2PC-501882624154466' on node "localhost:5432"
HINT:  Manually execute COMMIT PREPARED 'pipeline_cluster.2PC-501882624154466' on node "localhost:5432".

If any of the nodes is down, two-phase commits can’t proceed so all commands that require it will fail. In the future, we will add the ability to dynamically change cluster membership.

Unsupported Statements

Statements that cannot be executed inside transaction blocks such as CREATE DATABASE cannot be executed via two-phase commits. So if you want to create a new database, you’ll have to manually connect to each node and run the appropriate command. In the future we’ll offer users with the ability to broadcast such commands on all nodes and just be informed in case it fails on one of them.

Commands that might require local resources such as COPY or CREATE EXTENSION are not executed via two-phase commit since we can’t expect local resources to be present on all nodes.