Load Balancing

All nodes in a PipelineDB Cluster cluster are capable of serving read and write requests. Therefore the simplest way to load balance would be to round-robin between all nodes in a cluster. While this can be done in application code, the easiest way to manage this is to use a load balancing proxy such as HAProxy.

HAProxy

We’ve included a script that generates a simple configuration file for HAProxy that you can use to load balance your cluster. The script connects to one of the nodes of your cluster, fetches the cluster topology and some configuration parameters and uses them to create a configuration that will round-robin all incoming connection requests. The script can be found at $(PIPELINEDB_ROOT)/bin/haproxy-config.

Its usage is as follows:

haproxy-config -h
usage: haproxy-config [-h] [--host HOST] [--port PORT] [--username USERNAME]
                      [--password PASSWORD] [--database DATABASE]
                      [--listen-port LISTEN_PORT]

optional arguments:
  -h, --help            show this help message and exit
  --host HOST           database server host or socket directory
  --port PORT           database server port
  --username USERNAME   database user name
  --password PASSWORD   database password
  --database DATABASE   database name to connect to
  --listen-port LISTEN_PORT
                        HAProxy listen port

Note

haproxy-config depends on psycopg2 and Bottle

The $(PIPELINEDB_ROOT)/bin/haproxy.cfg.template file contains the template that is used to generate the configuration file. It is in Bottle’s Simple Template Engine format. The defaults in the configuration file might not work for your setup, so please go over the template file to make sure everything makes sense. Details for how to configure HAProxy can be found here.

Note

The template configuration we provide requires HAProxy 1.5+ because it uses tcp-check as the health check to determine if a node is up or not.

Once your HAProxy instance is running, all clients can connect to it as if it was a regular PipelineDB endpoint:

pipeline -h <haproxy host> -p <haproxy port> -c "SELECT 1"

Connection Pooling

Since PostgreSQL uses a process per-connection, it’s always a good idea to use a connection pooler such as PgBouncer. The architecture we’d recommend is to put all PipelineDB Cluster nodes behind HAProxy and then use PgBouncer to connect to HAProxy. Running PgBouncer locally is probably going to work best because that way the client doesn’t have to connect to a remote PgBouncer instance for every request and all connections to the remote HAProxy server are maintained locally.