Load Balancing Guide

Introduction

When, for scalability or reliability, Openfire is being run in a clustered configuration, it is desirable to spread out inbound connections in a predictable and configurable manner over the available cluster nodes.

Topics that are covered in this document:

Some related topics are not covered in this document, but are instead discussed in other documents:

Enable clustering in Openfire - describes how to use the Hazelcast plugin to add clustering functionality to Openfire.
Clustered Database Guide - Instructions on using Openfire with a database that consists of more than one server.

Background

Load balancing is primarily used to control the distribution of (client) connections over a cluster of Openfire servers. There are, however, other scenarios in which having load balancing features can be useful.

A load balancer can be used to implement a fail-over strategy. Such a strategy is particularly useful when your (possibly single-instance) Openfire server is being replaced, or even temporarily unavailable. With containerization, these scenarios are becoming more and more prevalent.

Network Ports to Load Balance

When using a load balancing solutions, the following external ports can be distributed over your load-balanced server cluster:

5222 and 5223 for TCP-based clients (this typically includes desktop and mobile clients)
7070 and 7443 for BOSH and websocket-based clients (most browser-based clients), as well as client-facing HTTP endpoints
5269 and 5270 for server federation (useful when users of your XMPP domain are interacting with users on other XMPP domains.

Avoid exposing Openfire's web-based administrative console (ports 9090 and 9091) via the load balancer! From a security perspective, this console should be reachable to anything but selected, priviliged network addresses. Additionally, for the admin console to function properly, each web request should end up with the same server. As some settings are to be applied to each individual server, the administrative consoles of each server should be individually addressable.

Proxying web-bindings over standard ports

Openfire by default serves its client-facing HTTP endpoints (including BOSH, websockets and HTTP endpoints, like the one used for HTTP file transfer) by default on ports 7070 (for HTTP) and 7443 (HTTPS). It is often desired to expose these endpoints over the standard HTTP ports: 80 and 443. Although more a topic on the subject of proxying, a load balancer could be used to provide such a mapping, by accepting connections on the desired public port, and load-balancing them to ports used by Openfire.

When applying such a configuration, it is important to realize that Openfire will announce some web endpoint addresses. Unless configured differently, it will advertise the ports that are used by Openfire itself. In this scenario, these no longer are the ports as to be used by remote peers.

To overcome this issue, Openfire allows you to override the default announced endpoints. For the announced port of the HTTP File Upload plugin, for example, the property plugin.httpfileupload.announcedWebPort can be used.

Using DNS Service (SRV) records

DNS Service (SRV) records can specify the network addresses (typically: the hostname and port) of servers that provide a particular service for a domain. In the specifications of the XMPP protocol, DNS SRV is defined as the preferred process to resolve domain names. As a result, most server and client implementations support this out of the box.

A DNS SRV request is issued to find records for a specific service on a specific domain. Each request can return more than one DNS SRV record. Each record specifies a target host within the domain where the service can be expected to be provided.

Example DNS SRV records

_xmpp-client._tcp.example.net. 86400 IN SRV 5 50 5222 server1.example.net.
_xmpp-client._tcp.example.net. 86400 IN SRV 10 30 5222 server2.example.net.

In the example above, the service defined as xmpp-client for the domain named example.net is reported to be made available on two target hosts:

a server with hostname server1.example.net, on port 5222
another server named server2.example.net, also on port 5222

Apart from the hostname and port of a service location, each DNS SRV record contains two other relevant values: a 'priority' and a 'weight' value. In the examples above, the priority values for each host are 5 and 10. The weight values are 50 and 30.

The priority and weight values can be used to configure the desired load balancing between the provided target hosts. A client must attempt to contact the target host with the lowest-numbered priority it can reach. Target hosts with the same priority are to be tried in an order defined by the weight field. The weight field specifies a relative weight for entries with the same priority. Larger weights are given a proportionately higher probability of being selected.

Given multiple records with specific 'priority' and 'weight' values, DNS SRV records can be used for various load balancing strategies, ranging from single one-server lookups (which is useful for running an XMPP domain on a server that does not resolve by the same name), simple round-robin load-balancing, to configurations that define a tiered setup, with fail-over servers taking over when a primary set of locations all fail to allow a client to connect.

The benefit of a load-balancing approach using DNS SRV records is that it is extremely light-weight: apart from a few additional records in the DNS system, there is no maintenance of networking infrastructure. Arguably the biggest drawback of using DNS SRV records is that it needs client support. Although most TCP-based XMPP clients can be expected to support DNS SRV lookups, this is not the case for clients that make use of BOSH-binding for XMPP or websockets, which are used by most browser-based clients.

Links to DNS SRV-related documentation

Configuring Session Persistence

Session Persistence (or: 'sticky sessions') ensures that all requests from a user during the session are sent to the same target. Clients connecting to Openfire can benefit from session persistence.

TCP-based clients depend on a single, long-lived socket connection. As such, there is little need for them to use session persistence. A notable exemption is when the Stream Management feature is used. This feature allows clients that have disconnected unexpectedly (for example, when a network interruption occurred), to resume its pre-existing connection. In Openfire, session resumption can only occur on the server where the session was originally created. When that's attempted on a different server, that process will fail. It's likely that the client in such a scenario will simply perform a full re-establishment of a new session. That is a little inefficient and inconvenient, but not necessarily a priority issue.

For BOSH and websocket-based connections, session persistence is a must: Openfire cannot properly service clients when its data arrives at different servers as the one where its session was originally created.

Server-to-server connections are established by using long-lived socket connections, similar to TCP-based clients. As Openfire will accept multiple concurrent connections from remote domains (and does not support the Stream Management feature for server-to-server connections), session persistence is not needed here.

Terminating TLS at the load balancer

To offload a significant amount of resource usage from Openfire, it can be desirable to have TLS negotiation happen on a load balancer instead of in Openfire itself.

Terminating TLS on the load balancer is typically straight-forward with modern load balancers and HTTPS traffic. This can be utilized for BOSH and websocket-based clients. For TCP-based connections, things are trickier.

TCP-based XMPP connections typically come in two flavors: Opportunistic TLS (STARTTLS) and Direct TLS (for client connections, Openfire provides opportunistic TLS on port 5222 and direct TLS on 5223). The two differ in when TLS negotiation takes place. With direct TLS, TLS negotiation happens prior to exchanging any XMPP-specific data (much like how HTTPS operates). With opportunistic TLS, the TLS handshakes occur embedded within the exchange of XMPP data.

Terminating direct TLS connections on a load balancer can be achievable, but for opportunistic TLS to be terminated on the load balancer, the balancer needs to support XMPP explicitly. At the time of writing, no such load balancers are known to exist. Third-party (reverse) proxy solutions do exist that claim to be able to terminate TLS. These projects are linked to below, but their compatibility with Openfire is unknown.